Upload
sandeep12416
View
12
Download
0
Tags:
Embed Size (px)
DESCRIPTION
HEPETIETIES VIRUS
Citation preview
1
Introduction
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins Bioinformatics is limited to sequence structural and functional
analysis of genes and genomes and their corresponding products and is often
considered computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis The areas of sequence
analysis include sequence alignment sequence database searching motif
and pattern discovery gene and promoter finding reconstruction of
evolutionary relationships and genome assembly and comparison
Structural analyses include protein and nucleic acid structure analysis
comparison Classification and prediction The functional analysis includes
gene expression profiling protein- protein interaction prediction protein sub
cellular localization prediction metabolic pathway reconstruction and
simulation The three aspects of bioinformatics analysis are not isolated but
often interact to produce integrated results For example protein structure
prediction depends on sequence alignment data clustering of gene
expression profiles requires the use of phylogenetic tree construction
methods derived In sequence analysis Sequence- based prediction is related
functional analysis of co expressed genes The first major bioinformatics
2
project was undertaken by Margaret Day off in 1965 who developed a first
protein sequence database called Atlas of Protein Sequence and Structure
Subsequently in the early 1970s the Brookhaven national laboratory
established the Protein Data Bank for archiving three-dimensional protein
structures At its onset the database stored less than a dozen protein
structures compared to more than 30000 structures today The first
sequence alignment algorithm was developed by Needleman and Wunsch in
1970 This was a fundamental step in the development of the field of
bioinformatics which paved the way for the routine sequence comparisons
and database searching practiced by modern biologists
10 The recent advance of Bioinformatics is molecular modeling which is
aimed at understanding structure-function and structure property relationship
in physic-chemical processes and pharmaceuticals amp thus has become
increasingly important for finding and designing new drugs In fact
computers are playing an important role in new drug discovery and drug
design
HEPATITIS-
3
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning
liver and suffix -itis denoting inflammationrsquo The condition can be self
limiting healing on its own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of
viruses known as the hepatitis viruses cause most cases of
liver damage worldwide Hepatitis can also be due to toxins
(notably alcohol) other infections or
from autoimmune process
It may run a sub
clinical course when the affected person may not feel ill
The patient becomes unwell and symptomatic when the
disease impairs liver functions that include among other
things screening of harmful substances regulation of blood
composition and production of bile to help digestion
4
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-
Barr Yellow fever virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q
fever Rocky mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D
hepatitis C (Hepatitis A and E do not lead to chronic
disease)
5
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon
azole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin
deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex
genetic information called DNA or RNA This genetic material allows the
virus to infect bacteria or living cells set up the machinery to reproduce
itself leading to destruction of the cell in which it resides To date five
viruses labeled A through E have been identified which appear to cause
viral hepatitis Viruses A and E can be contracted from contaminated water
or food (by mouth) while viruses B C and D are transmitted by direct
injection into the bloodstream (through any method of injection under the
skin) The term viral hepatitis describes any one of the illnesses caused by
the five viruses mentioned and consists of an infection of liver cells which
leads to damage of the liver over days in some cases but over many years in
others Thirty years ago none of the hepatitis viruses had been identified In
the 1960s transfusion-related viral hepatitis was extremely common with
30 of patients receiving blood products becoming infected By 1970 a
6
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Introduction
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins Bioinformatics is limited to sequence structural and functional
analysis of genes and genomes and their corresponding products and is often
considered computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis The areas of sequence
analysis include sequence alignment sequence database searching motif
and pattern discovery gene and promoter finding reconstruction of
evolutionary relationships and genome assembly and comparison
Structural analyses include protein and nucleic acid structure analysis
comparison Classification and prediction The functional analysis includes
gene expression profiling protein- protein interaction prediction protein sub
cellular localization prediction metabolic pathway reconstruction and
simulation The three aspects of bioinformatics analysis are not isolated but
often interact to produce integrated results For example protein structure
prediction depends on sequence alignment data clustering of gene
expression profiles requires the use of phylogenetic tree construction
methods derived In sequence analysis Sequence- based prediction is related
functional analysis of co expressed genes The first major bioinformatics
2
project was undertaken by Margaret Day off in 1965 who developed a first
protein sequence database called Atlas of Protein Sequence and Structure
Subsequently in the early 1970s the Brookhaven national laboratory
established the Protein Data Bank for archiving three-dimensional protein
structures At its onset the database stored less than a dozen protein
structures compared to more than 30000 structures today The first
sequence alignment algorithm was developed by Needleman and Wunsch in
1970 This was a fundamental step in the development of the field of
bioinformatics which paved the way for the routine sequence comparisons
and database searching practiced by modern biologists
10 The recent advance of Bioinformatics is molecular modeling which is
aimed at understanding structure-function and structure property relationship
in physic-chemical processes and pharmaceuticals amp thus has become
increasingly important for finding and designing new drugs In fact
computers are playing an important role in new drug discovery and drug
design
HEPATITIS-
3
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning
liver and suffix -itis denoting inflammationrsquo The condition can be self
limiting healing on its own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of
viruses known as the hepatitis viruses cause most cases of
liver damage worldwide Hepatitis can also be due to toxins
(notably alcohol) other infections or
from autoimmune process
It may run a sub
clinical course when the affected person may not feel ill
The patient becomes unwell and symptomatic when the
disease impairs liver functions that include among other
things screening of harmful substances regulation of blood
composition and production of bile to help digestion
4
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-
Barr Yellow fever virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q
fever Rocky mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D
hepatitis C (Hepatitis A and E do not lead to chronic
disease)
5
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon
azole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin
deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex
genetic information called DNA or RNA This genetic material allows the
virus to infect bacteria or living cells set up the machinery to reproduce
itself leading to destruction of the cell in which it resides To date five
viruses labeled A through E have been identified which appear to cause
viral hepatitis Viruses A and E can be contracted from contaminated water
or food (by mouth) while viruses B C and D are transmitted by direct
injection into the bloodstream (through any method of injection under the
skin) The term viral hepatitis describes any one of the illnesses caused by
the five viruses mentioned and consists of an infection of liver cells which
leads to damage of the liver over days in some cases but over many years in
others Thirty years ago none of the hepatitis viruses had been identified In
the 1960s transfusion-related viral hepatitis was extremely common with
30 of patients receiving blood products becoming infected By 1970 a
6
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
project was undertaken by Margaret Day off in 1965 who developed a first
protein sequence database called Atlas of Protein Sequence and Structure
Subsequently in the early 1970s the Brookhaven national laboratory
established the Protein Data Bank for archiving three-dimensional protein
structures At its onset the database stored less than a dozen protein
structures compared to more than 30000 structures today The first
sequence alignment algorithm was developed by Needleman and Wunsch in
1970 This was a fundamental step in the development of the field of
bioinformatics which paved the way for the routine sequence comparisons
and database searching practiced by modern biologists
10 The recent advance of Bioinformatics is molecular modeling which is
aimed at understanding structure-function and structure property relationship
in physic-chemical processes and pharmaceuticals amp thus has become
increasingly important for finding and designing new drugs In fact
computers are playing an important role in new drug discovery and drug
design
HEPATITIS-
3
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning
liver and suffix -itis denoting inflammationrsquo The condition can be self
limiting healing on its own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of
viruses known as the hepatitis viruses cause most cases of
liver damage worldwide Hepatitis can also be due to toxins
(notably alcohol) other infections or
from autoimmune process
It may run a sub
clinical course when the affected person may not feel ill
The patient becomes unwell and symptomatic when the
disease impairs liver functions that include among other
things screening of harmful substances regulation of blood
composition and production of bile to help digestion
4
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-
Barr Yellow fever virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q
fever Rocky mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D
hepatitis C (Hepatitis A and E do not lead to chronic
disease)
5
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon
azole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin
deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex
genetic information called DNA or RNA This genetic material allows the
virus to infect bacteria or living cells set up the machinery to reproduce
itself leading to destruction of the cell in which it resides To date five
viruses labeled A through E have been identified which appear to cause
viral hepatitis Viruses A and E can be contracted from contaminated water
or food (by mouth) while viruses B C and D are transmitted by direct
injection into the bloodstream (through any method of injection under the
skin) The term viral hepatitis describes any one of the illnesses caused by
the five viruses mentioned and consists of an infection of liver cells which
leads to damage of the liver over days in some cases but over many years in
others Thirty years ago none of the hepatitis viruses had been identified In
the 1960s transfusion-related viral hepatitis was extremely common with
30 of patients receiving blood products becoming infected By 1970 a
6
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue Etymologically from ancient Greek hepar or hepato- meaning
liver and suffix -itis denoting inflammationrsquo The condition can be self
limiting healing on its own or can progress to scarring of the liver
Hepatitis is acute when it lasts less than 6 months
and chronic when it persists longer A group of
viruses known as the hepatitis viruses cause most cases of
liver damage worldwide Hepatitis can also be due to toxins
(notably alcohol) other infections or
from autoimmune process
It may run a sub
clinical course when the affected person may not feel ill
The patient becomes unwell and symptomatic when the
disease impairs liver functions that include among other
things screening of harmful substances regulation of blood
composition and production of bile to help digestion
4
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-
Barr Yellow fever virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q
fever Rocky mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D
hepatitis C (Hepatitis A and E do not lead to chronic
disease)
5
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon
azole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin
deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex
genetic information called DNA or RNA This genetic material allows the
virus to infect bacteria or living cells set up the machinery to reproduce
itself leading to destruction of the cell in which it resides To date five
viruses labeled A through E have been identified which appear to cause
viral hepatitis Viruses A and E can be contracted from contaminated water
or food (by mouth) while viruses B C and D are transmitted by direct
injection into the bloodstream (through any method of injection under the
skin) The term viral hepatitis describes any one of the illnesses caused by
the five viruses mentioned and consists of an infection of liver cells which
leads to damage of the liver over days in some cases but over many years in
others Thirty years ago none of the hepatitis viruses had been identified In
the 1960s transfusion-related viral hepatitis was extremely common with
30 of patients receiving blood products becoming infected By 1970 a
6
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Causes
Acute hepatitis
Viral Hepatitis Hepatitis A to E (more than 95 of viral
cause) Herpes simplex Cytomegalovirus Epstein-
Barr Yellow fever virus Adenoviruses
Non viral infection Toxoplasma Leptospira Q
fever Rocky mountain spotted fever
Alcohol
Toxins Amanita toxin in mushrooms Carbon
tetrachloride Asafetida
Drugs Paracetamol Amoxicillin Antituberculosis
medicines Minocycline and many others
Ischemic hepatitis (circulatory insufficiency)(1)
Pregnancy
Auto immune conditions eg Systemic Lupus
Erythematosus (SLE)
Metabolic diseases eg Wilsons disease
Chronic hepatitis
Viral hepatitis Hepatitis B with or without hepatitis D
hepatitis C (Hepatitis A and E do not lead to chronic
disease)
5
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon
azole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin
deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex
genetic information called DNA or RNA This genetic material allows the
virus to infect bacteria or living cells set up the machinery to reproduce
itself leading to destruction of the cell in which it resides To date five
viruses labeled A through E have been identified which appear to cause
viral hepatitis Viruses A and E can be contracted from contaminated water
or food (by mouth) while viruses B C and D are transmitted by direct
injection into the bloodstream (through any method of injection under the
skin) The term viral hepatitis describes any one of the illnesses caused by
the five viruses mentioned and consists of an infection of liver cells which
leads to damage of the liver over days in some cases but over many years in
others Thirty years ago none of the hepatitis viruses had been identified In
the 1960s transfusion-related viral hepatitis was extremely common with
30 of patients receiving blood products becoming infected By 1970 a
6
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Autoimmune Autoimmune hepatitis
Alcohol
Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon
azole
Non-alcoholic steatohepatitis
Heredity Wilsons disease alpha 1-antitrypsin
deficiency
Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis
Viral hepatitis
A virus is a particle which is smaller than bacteria and contains complex
genetic information called DNA or RNA This genetic material allows the
virus to infect bacteria or living cells set up the machinery to reproduce
itself leading to destruction of the cell in which it resides To date five
viruses labeled A through E have been identified which appear to cause
viral hepatitis Viruses A and E can be contracted from contaminated water
or food (by mouth) while viruses B C and D are transmitted by direct
injection into the bloodstream (through any method of injection under the
skin) The term viral hepatitis describes any one of the illnesses caused by
the five viruses mentioned and consists of an infection of liver cells which
leads to damage of the liver over days in some cases but over many years in
others Thirty years ago none of the hepatitis viruses had been identified In
the 1960s transfusion-related viral hepatitis was extremely common with
30 of patients receiving blood products becoming infected By 1970 a
6
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
blood test called the Australia antigen was developed which appeared to
identify those infected with one hepatitis virus which we now call hepatitis
B The investigator who discovered the Australia antigen the protein which
makes up the coat of the virus and which is now called the hepatitis B
surface antigen (HBsAg) was awarded the Nobel prize Our understanding
of viral hepatitis has grown tremendously since the discovery of the
Australia antigen
Currently 11 viruses are recognized as causing hepatitis
Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr
virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-
resolving forms of hepatitis with no permanent hepatic damage Both viruses
causes the typical infectious mononucleosis of fatigue nausea and malaise
Of the nine human hepatotrofic viruses only five are well
characterized hepatitis G and TTV(transfusion transmitted virus) are newly
discovered viruses hepatitis A (sometimes called infectious hepatitis) and
hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are
transmitted by fecal-oral contamination The most important type include
hepatitis B(sometime called serum hepatitis) hepatitis C (formally called
formally non-A non-B hepatic) and hepatitis D (formally called delta
hepatitis)
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)
Milder disease than Hepatitis B asymptomatic infections are very common
especially in children
7
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Adults especially pregnant women may develop more severe disease
Although convalescence may be prolonged there is no chronic form of the
disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut
replicates in the alimentary tract and spreads to infect the liver where it
multiplies in hepatocytes
Viraemia is transient Virus is excreted in the stools for two weeks
preceding the onset of symptoms
World-wide distribution endemic in most countries The incidence in first
world countries is declining There is an especially high incidence in
developing countries and rural areas In rural areas of South Africa the
seroprevalence is 100
Hepatitis E
Incubation period 30-40 days
Acute self limiting hepatitis no chronic carrier state
Age predominantly young adults 15-40 years Fulminate hepatitis in
pregnant women Mortality rate is high (up to 40)Similar to hepatitis A
virus replicates in the gut initially before invading the liver and virus is
shed in the stool prior to the onset of symptoms Viraemia is transient A
large inoculum of virus is needed to establish infectionLittle is known yet
The incidence of infection appears to be low in first world countries
8
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses
Thus probably enveloped Has a ssRNA genome
Does not grow in cell culture but can infect Chimpanzees Incubation period
6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50 individuals develop chronic infection following exposure
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide high incidence in Japan Italy and Spain
In South Africa 1 blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to
replicate Infection therefore only occurs in patients who are already
infected with Hepatitis BIncreased severity of liver disease in Hepatitis B
carriers virus particle 36 nm in diameter encapsulated with HBsAg derived
from HBV delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A non-B
non-C hepatitis has been called Hepatitis G virus It was implicated as a
9
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
cause of parenterally transmitted hepatitis but is no longer believed to be a
major agent of liver disease It has been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of
infecting human liver cells and other cells in the body once it gains access
to the blood stream One of the most interesting features of the hepatitis B
virus is that the virus itself does not damage the liver the damage being
caused by the individuals own immune system attacking the virus-infected
cells Since liver damage from the virus may be very little many patients are
called healthy carriers This means that although they may transmit the
disease to others they have normal-appearing livers and normal liver
function tests While many individuals remain healthy for many years or a
lifetime others develop chronic hepatitis cirrhosis and occasionally liver
cell cancer These outcomes are linked to the virus and its effects although it
is unlikely that the virus directly causes cancer Those patients who develop
hepatitis (damage to liver cells with inflammation) do so on account of the
bodys normal inclination to attack the foreign proteins contained in viruses
and in the cells in which the viruses are found This process called the
immune response determines the pace and the severity of the liver cell
injury in this condition and will be described in more detail below
Since the identification of the hepatitis B virus several other viruses which
are nearly identical have been identified in Eastern woodchucks ground
squirrels and Peking ducks The members of this virus family termed the
10
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Hepadna viruses have similar life cycles to that observed in man and can
serve as animal models allowing further study of these unique disease-
causing agents
Classification and general features
Family hepadnaviridae
Genera orthohepadnavirus(eghepatitis B [HBV] of
human ) Avihepadnavirus (eg Duck hepatitis B virus)
Size 42nm Virions (also known as Dane particles) contain a circular
dsDNA genome
11
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Fighepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as spheres amp tubules
HBcAg = inner core protein
HBeAg = secreted protein function unknown
Clinical Features Incubation period 2 - 5 months
Insidious onset of symptoms Tends to cause a more severe disease than
Hepatitis A
Asymptomatic infections occur frequently
Pathogenesis
Infection is parenterally transmitted The virus replicates in the liver and
virus particles as well as excess viral surface protein are shed in large
amounts into the blood Viraemia is prolonged and the blood of infected
individuals is highly infectious
Complications
1) Persistant infection-
Following acute infection approximately 5 of infected individuals fail to
eliminate the virus completely and become persistantly infected
12
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Those who are at particular risk include
babies young children
immunocompromised patients
males gt females
The virus persists in the hepatocytes and on-going liver damage occurs
because of the host immune response against the infected liver cells
Chronic infection may take one of two forms
Chronic persistent Hepatitis - the virus persists but there is minimal liver
damage
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and
rapid progression to cirrhosis or liver failure Patients who become
persistently infected are at risk of developing hepatocellular carcinoma
(HCC)
HBV is thought to play a role in the development of this malignancy
because
a) 80 of patients with HCC are carriers of hepatitis B
b) Virus DNA can be identified in hepatocellular carcinoma cells
c) Virus DNA can integrate into the host chromosome
3) Fulminant Hepatitis
Rare accounts for 1 of infections
Epidemiology
Prevalence of disease in Africa
World-wide there are 450 million persistant carriers of hepatitis B 50
13
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
million of which are in Africa Carriage rates vary markedly in different
areas In South Africa infection is much more common in rural communities
than in the cities Hepatitis B is parenterally transmitted
1) Blood
Blood transfusions serum products
sharing of needles razors
Tattooing acupuncture
Renal dialysis
Organ donation
2) Sexual intercourse
3) Horizontal transmission in children families close personal contact
This is the major mode of transmission in South Africa where the majority
of individuals become infected at between three and nine years of age
Horizontal transmission also occurs in childrens institutions and mental
homes
4) Vertical transmission - perinatal transmission from a carrier mother to
her baby
14
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Tran placental (rare)
during delivery
Post natal breast feeding close contact
(This is the major mode of transmission in South East Asia)
Diagnosis Serology
Acute infection with resolution Viral antigens
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
15
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers (see below)
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBVof the chronic carrier
FigHepatitis B virus in serum
Prevention
1) Active Immunization
Two types of vaccine are available
16
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Serum derived - prepared from HBsAg purified from the serum of
HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective The administration of three
doses induces protective levels of antibodies in 95 of vaccine recipients
Universal immunization of infants was introduced in April 1995 Infants
receive 3 doses at 6 10 and 14 weeks of age
Vaccine should be administered to people at high risk of infection with
HBV
1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune
individuals following single episode exposure to HBV-infected blood For
example needle stick injuries
What is Hepatitis B Infection Like
When most individuals become infected with the hepatitis B virus they are
not aware of the infection for several weeks until they develop symptoms of
acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)
The acute hepatitis phase may last for several weeks and occasionally leads
to hospitalization but acute hepatitis B resolves completely in 95 of those
infected
17
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Others who do not develop significant symptoms
following exposure may not be aware of the infection These individuals
may also overcome the infection completely and develop immunity but
frequently become chronic carriers
The outcome of hepatitis B infection depends to a great
extent on the status of the persons immune system at the time of exposure
Most chronic carriers or those with chronic hepatitis B are not aware of their
on-going infection although some have persistent fatigue
Molecular virology
Genome circular and 32kb in size double stranded It has compact
18
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Fig hepatitis B virus genome
organization with four overlapping reading frames running in one direction
and no noncoding regions The minus strand is unit length and has a protein
covalently attached to the 5 end The other strand the plus strand is
variable in length but has less than unit length and has an
RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and
circularity is maintained by cohesive ends (Strauss 2002) The four
overlapping open reading frames (ORFs) in the genome are responsible for
the transcription and expression of seven different hepatitis B proteins The
transcription and translation of these proteins is through the used of multiple
in-frame start codons The HBV genome also contains parts that regulate
transcription determine the site of polyadenylation and a specific transcript
for encapsidation into the nucleocapsid
Life cycle
In order to reproduce the hepatitis B virus must first attach onto a cell which is
capable of supporting its replication Although hepatocytes are known to be the most
effective cell type for replicating HBV other types of cells in the human body have
be found to be able to support replication to a lesser degree
The initial steps following HBV entry are not clearly defined
although it is known that the virion initially attaches to a susceptible hepatocyte
through recognition of cell surface receptor that has yet to be indified (Garces
19
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
HBVP) The DNA is then enters into the nucleus where it is known to form a
convalently close circular form called cccDNA
The (-) strand of cccDNA is the template for
transcription by RNA polII of a longer than genome length RNA called the
pregenome and shorter subgenomic transcripts all of which serve as mRNAs The
shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic
reticulum and the proteins that are destined to become HBV surface antigens in the
viral envelope are assembled
The pregenome RNA is translated to produce a polymerase protein P
which then binds to a specific site at the 3 end of its own transcript where viral
DNA synthesis eventually occurs Occuring at the same time as capsid formation
the RNA-P protein complex is packaged and reverse transcription begins
At early times after the infection the DNA is recirculated
to the nucleus where the process is repeated resulting in the the accumulation of 10
to 30 molecules of CCC DNA and an increase in viral mRNA concentrations
(Flint etal 765 )
20
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Fig HBV life cycle
The hepatitis B virion also known as the Dane particle is the one infectious
particle found within the body of an infected patient This virion has a
diameter of 42nm and its outer envelope contains a high quantity of hepatitis
b surface proteins The envelope surrounds the inner nucleocapsid which is
made up of 180 hepatitis B core proteins arranged in an icosahedral
arrangement The nucleocapsid also contains at least one hepatitis b
ploymerase protein (P) along with the HBV genome
21
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
In infected people virions actually compose a small minority of HBV-
derived particles Large numbers of smaller subviral particles are also
presentthat usually outnumber the virions in the ratio of 1001These two
subviral particles the hepatitis B filament and a hepatitis B sphereare often
referred to as a group named surface antigen particlesThe sphere contains
both middle and small surface proteins whereas the filament also includes
large hepatitis B surface protein lso includes large hepatitis B surface
protein The absence of the hepatitis B core polymerase and genome causes
these particles to have a non-infectious nature High levels of these non-
infectious particles can be found during the acute phase of the infection
Since the non-infectious particles present the same sites as the virion they
induce a significant immune response and are thought to be non-
advantagous for the virus However it is also believed that the presence of
high levels of non-infectious particles may allow the infectious viral
particles to travel undetected by antibodies through the blood stream
(Garces HBVP
Hepatitis B Antigens
There are three different types of hepatitis b antigens encoded by the HBV
genome-
Hepatitis B Surface antigen (HBsAg)- There are three different types of
hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or
SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis
B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis
22
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
B surface proteins and has historically been known as the Australia antigen
(Au antigen) It is very hydrophobic containing four-transmembrane
spanning regions This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high
quantities It also contains a highly antigenic epitope which may be
responsible for triggering immune response Regardless of the high
Antigenicity and prevalence of these particlesthe immune system appears
basically oblivious to their presence
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be
detected directly by blood test this antigen can only be isolated by analyzing
an infected hepatocyte A 185 amino acid protein is expressed in the
cytoplasm of infected cells they are highly associated with nucleocapsid
assembly (Strauss 2002)
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early
appearance during an acute HBV infection Thought to be located in the core
structure of the virus molecule this antigen can be detected by blood test If
found its usually indicative of complete virus particles in circulation
(Strauss 2002)
23
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
REVIEW OF LITERATURE
Approximately 5 of the world population is infected by the hepatitis B
virus (HBV) that causes a necroinflammatory liver disease of variable
duration and severity Chronically infected patients with active liver disease
carry a high risk of developing cirrhosis and hepatocellular carcinoma
24
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA
virus of Complex structure HBV is classified as orthohepadnavirus within
the family Headnaviridae Serum of individuals infected with hepatitis B
contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm
(containing DNA and DNA polymerase) called Dane particle and tubular or
filamentous that vary in length These are infective form of virusThe
hepatitis B is normally transmitted by blood transfusion contaminated
equipment drug usersrsquo unsterile needle or any body secretion
The immune response to HBV-encoded antigens is responsible both for viral
clearance and for disease pathogenesis during this infection While the
humoral antibody response to viral envelope antigens contributes to the
clearance of circulating virus particles the cellular immune response to the
envelope nucleocapsid and polymerase antigens eliminates infected cells
The dominant cause of viral persistence during HBV infection is the
development of a weak antiviral immune response to the viral antigens
While neonatal tolerance probably plays an important role in viral
persistence in patients infected at birth the basis for poor responsiveness in
adult-onset infection is not well understood and requires further analysis
Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an
ineffective immune response as can the incomplete down regulation of viral
gene expression and the infection of immunologically privileged tissues
Chronic liver cell injury and the attendant inflammatory and regenerative
responses create the mutagenic and mutagenic stimuli for the development
of DNA damage that can cause hepatocellular carcinoma Elucidation of the
25
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
immunological and virological basis for HBV persistence may yield
immunotherapeutic and antiviral strategies to terminate chronic HBV
infection and reduce the risk of its life-threatening sequellae
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections
of the liver Transient infections run a course of several months and chronic
infections are often lifelong Chronic infections can lead to liver failure with
cirrhosis and hepatocellular carcinoma The replication strategy of these
viruses has been described in great detail but virus-host interactions leading
to acute and chronic disease are still poorly understood Studies on how the
virus evades the immune response to cause prolonged transient infections
with high-titer viremia and lifelong infections with an ongoing inflammation
of the liver are still at an early stage and the role of the virus in liver cancer
is still elusive The state of knowledge in this very active field is therefore
reviewed with an emphasis on past accomplishments as well as goals for the
future
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm
spheres and tubules Its presence in serum indicates that virus replication is
occurring in the liver
2) e antigen (HBeAg) secreted protein is shed in small amounts into the
blood Its presence in serum indicates that a high level of viral replication is
occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
26
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence
and indicates immunity following infection It remains detectable for life and
is not found in chronic carriers
2) e antibody (anti-HBe) becomes detectable as viral replication falls It
indicates low infectivity in a carrier
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM and remains present for life in both
chronic carriers as well as those who clear the infection Its presence
indicates exposure to HBV of the chronic carrier
Homology or comparative modeling involves the prediction of the structure
of a query sequence from the structures of one or more structural templates
The procedure involves the identification of possible templates that have a
clear sequence relationship to the query the assembly of the model the
prediction of regions of the structure that are likely to have different
conformations than the templates (eg loops) and ultimately the
refinement of the structure in an attempt to account for inherent differences
between the template and query structures As mentioned above homology
27
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
modeling figures heavily as a rationale for structural genomics initiatives
under the stated assumption that accurate models can be built for query
sequences that have a greater than 30 sequence identity with their best
template
The quality of the alignment of the query to the template sequence is a major
factor in determining the quality of homology models This is one of the
sources of the 30 rule because alignment quality usually decreases
dramatically below about 30 sequence identity (A structural explanation
for this observation has been offered by Chung and Subbiah 1996)
Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing
improvements in the quality of homology models
With the number of protein-ligand complexes available in the Protein Data
Bank constantly growing structure-based approaches to drug design and
screening have become increasingly important Alongside this explosion of
structural information a number of molecular docking methods have been
developed over the last years with the aim of maximally exploiting all
available structural and chemical information that can be derived from
proteins from ligands and from protein-ligand complexes In this respect
the term guided docking is introduced to refer to docking approaches that
incorporate some degree of chemical information to actively guide the
orientation of the ligand into the binding site To reflect the focus on the use
of chemical information a classification scheme for guided docking
approaches is proposed In general terms guided docking approaches can be
divided into indirect and direct approaches Indirect approaches incorporate
chemical information implicitly having an effect on scoring but not on
28
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
orienting the ligand during sampling In contrast direct approaches
incorporate chemical information explicitly thus actively guiding the
orientation of the ligand during sampling Direct approaches can be further
divided into protein-based mapping-based and ligand-based approaches to
reflect the source used to derive the features capturing the chemical
information inside the protein cavity Within each category a representative
list of docking approaches is discussed In view of the limitations of current
scoring functions it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for
improving binding affinity estimations ligand binding-mode predictions
and virtual screening enrichments obtained from protein-ligand docking
This review gives an introduction into ligand - receptor docking and
illustrates the basic underlying concepts An overview of different
approaches and algorithms is provided Although the application of docking
and scoring has led to some remarkable successes there are still some major
challenges ahead which are outlined here as well Approaches to address
some of these challenges and the latest developments in the area are
presented Some aspects of the assessment of docking program performance
are discussed A number of successful applications of structure-based virtual
screening are described
29
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
30
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science It involves the technology that uses
computers for storage retrieval manipulation and distribution of
information related to biological macromolecules such as DNA RNA and
proteins
31
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Bioinformatics is limited to sequence structural and functional analysis of
genes and genomes and their corresponding products and is often considered
computational molecular biology It consists of two subfields the
development of computational tools and databases and the application of
these tools and databases in generating biological knowledge to better
understand living systems These tools are used in three areas of genomic
and molecular biological research molecular sequence analysis molecular
structural analysis and molecular functional analysis
1 NCBI-
Established in 1988 as a national resource for molecular biology
information NCBI creates public databases conducts research in
computational biology develops software tools for analyzing genome
data and disseminates biomedical information - all for the better
understanding of molecular processes affecting human health and disease
Swiss-prot-
a curated protein sequence database which strives to
provide a high level of annotation (such as the description
of the function of a protein its domains structure post-
translational modifications variants etc) a minimal level
of redundancy and high level of integration with other
databases
32
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc
3 FASTA
FASTA is a DNA and Protein sequence alignment software package first
described (as FASTP) by David J Lipman and William R Pearson in 1985
in the article Rapid and sensitive protein similarity searches The original
FASTP program was designed for protein sequence similarity searching
FASTA described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNADNA searches translated
proteinDNA searches and also provided a more sophisticated shuffling
program for evaluating statistical significance There are several programs in
this package that allow the alignment of protein sequences and DNA
sequences FASTA is pronounced FAST-Aye and stands for FAST-All
because it works with any alphabet an extension of FAST-P (protein) and
FAST-N (nucleotide) alignment
The current FASTA package contains programs for proteinprotein
DNADNA proteintranslated DNA (with frameshifts) and ordered or
unordered peptide searches Recent versions of the FASTA package include
special translated search algorithms that correctly handle frameshift errors
(which six-frame-translated searches do not handle very well) when
comparing nucleotide to protein sequence data
In addition to rapid heuristic search methods the FASTA package provides
SSEARCH an implementation of the optimal Smith-Waterman algorithm A
33
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
major focus of the package is the calculation of accurate similarity statistics
so that biologists can judge whether an alignment is likely to have occurred
by chance or whether it can be used to infer homology The FASTA
package is available fromfastabiochvirginiaedu
4BLAST
In bioinformatics Basic Local Alignment Search Tool or BLAST is
an algorithm for comparing primary biological sequence information such
as the amino-acid sequences of different proteins or the nucleotides of DNA
sequences A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences and identify library
sequences that resemble the query sequence above a certain threshold
5 Primary amp secondary structure analysis
Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be
deduced from a protein sequence No additional information is required
about the protein under consideration The protein can either be specified as
a Swiss-ProtTrEMBL accession number or ID or in form of a raw
sequence White space and numbers are ignored If you provide the
accession number of a Swiss-ProtTrEMBL entry you will be prompted
with an intermediary page that allows you to select the portion of the
sequence on which you would like to perform the analysis The choice
includes a selection of mature chains or peptides and domains from the
Swiss-Prot feature table (which can be chosen by clicking on the positions)
34
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
as well as the possibility to enter start and end position in two boxes By
default (ie if you leave the two boxes empty) the complete sequence will be
analyzed
It calculate following parameter --
extinction coefficient
half-life
instability index
aliphatic index
Using SOPMA for secondary structure analysis
Recently a new method called the self-optimized prediction method (SOPM)
has been described to improve the success rate in the prediction of the
secondary structure of proteins In this paper we report improvements
brought about by predicting all the sequences of a set of aligned proteins
belonging to the same family This improved SOPM method (SOPMA)
35
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
correctly predicts 695 of amino acids for a three-state description of the
secondary structure ( -helix szlig-sheet and coil) in a whole database containing
126 chains of non-homologous (less than 25 identity) proteins Joint
prediction with SOPMA and a neural networks method (PHD) correctly
predicts 822 of residues for 74 of co-predicted amino acids Predictions
are available by Email to deleageibcpfr or on a Web page
(httpwwwibcpfrpredicthtml )
PROTOCOL FOLLOWED
Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot
36
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor
Validated modeled receptor using Structure Analysis Validation Server (SAVS)
Verified our model through different parameter like Ranachandran plot and other which is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease
Run the HEX and found the structure of drug molecule
37
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Homology modeling
In protein structure prediction homology modeling also known as comparative
modeling is a class of methods for constructing an atomic-resolution model of a protein
from its amino acid sequence (the query sequence or target) Almost all homology
modeling techniques rely on the identification of one or more known protein structures
(known as templates or parent structures) likely to resemble the structure of the query
sequence and on the production of an alignment that maps residues in the query
sequence to residues in the template sequence The sequence alignment and template
structure are then used to produce a structural model of the target Because protein
structures are more conserved than protein sequences detectable levels of sequence
similarity usually imply significant structural similarity
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure The approach can be complicated by the presence of alignment
gaps (commonly called indels) that indicate a structural region present in the target but
not in the template and by structure gaps in the template that arise from poor resolution
in the experimental procedure (usually X-ray crystallography) used to solve the structure
Model quality declines with decreasing sequence identity a typical model has ~2 Aring
agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring
agreement at 25 sequence identity Regions of the model that were constructed without
a template usually by loop modeling are generally much less accurate than the rest of
the model particularly if the loop is long Errors in side chain packing and position also
increase with decreasing identity and variations in these packing configurations have
been suggested as a major reason for poor model quality at low identity [2] Taken
together these various atomic-position errors are significant and impede the use of
homology models for purposes that require atomic-resolution data such as drug design
and protein-protein interaction predictions even the quaternary structure of a protein may
be difficult to predict from homology models of its subunit(s) Nevertheless homology
models can be useful in reaching qualitative conclusions about the biochemistry of the
query sequence especially in formulating hypotheses about why certain residues are
38
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
conserved which may in turn lead to experiments to test those hypotheses
For example the spatial arrangement of conserved residues may suggest
whether a particular residue is conserved to stabilize the folding to
participate in binding some small molecule or to foster association with
another protein or nucleic acid
Figure First the known template 3D structures are aligned with the
target sequence to be modelled Second spatial features such as CZ -
CZ distances hydrogen bonds and main chain and side chain dihedral
angles are transferred from the templates to the target Thus a number
39
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
of spatial restraints on its structure are obtained Third the 3D model is
obtained by satisfying all the restraints as well as possible
Homology modeling can produce high-quality structural models when the
target and template are closely related which has inspired the formation of a
structural genomics consortium dedicated to the production of representative
experimental structures for all classes of protein folds The chief
inaccuracies in homology modeling which worsen with lower sequence
identity derive from errors in the initial sequence alignment and from
improper template selection Like other methods of structure prediction
current practice in homology modeling is assessed in a biannual large-scale
experiment known as the Critical Assessment of Techniques for Protein
Structure Prediction or CASP
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best
template structure if indeed any are available The simplest method of
template identification relies on serial pairwise sequence alignments aided
by database search techniques such as FASTA and BLAST More sensitive
methods based on multiple sequence alignment - of which PSI-BLAST is
the most common example - iteratively update their position-specific scoring
matrix to successively idenfity more distantly related homologs This family
of methods has been shown to produce a larger number of potential
templates and to identify better templates for sequences that have only
distant relationships to any solved structure Protein threading also known
40
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
as fold recognition or 3D-1D alignment can also be used as a search
technique for identifying templates to be used in traditional homology
modeling methods When performing a BLAST search a reliable first
approach is to identify hits with a sufficiently low E-value which are
considered sufficiently close in evolution to make a reliable homology
model Other factors may tip the balance in marginal cases for example the
template may have a function similar to that of the query sequence or it may
belong to a homologous operon However a template with a poor E-value
should generally not be chosen even if it is the only one available since it
may well have a wrong structure leading to the production of a misguided
model A better approach is to submit the primary sequence to fold-
recognition servers or better still consensus meta-servers which improve
upon individual fold-recognition servers by identifying similarities
(consensus) among independent predictions
Often several candidate template structures are identified by these
approaches Although some methods can generate hybrid models from
multiple templates most methods rely on a single template Therefore
choosing the best template from among the candidates is a key step and can
affect the final accuracy of the structure significantly This choice is guided
by several factors such as the similarity of the query and template
sequences of their functions and of the predicted query and observed
template secondary structures Perhaps most importantly the coverage of the
aligned regions the fraction of the query sequence structure that can be
predicted from the template and the plausibility of the resulting model
Thus sometimes several homology models are produced for a single query
sequence with the most likely candidate chosen only in the final step
41
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
It is possible to use the sequence alignment generated by the database search
technique as the basis for the subsequent model production however more
sophisticated approaches have also been explored
7 Molecular Docking
Introduction to Docking
Docking studies are molecular modelling studies aiming at finding a proper
fit between a ligand and its binding site
There are two classes of protein docking
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions
Protein-protein interactions occur between two proteins that are similar in
size The interface between the two molecules tend to be flatter and
smoother than those in protein-ligand interactions Protein-protein
interactions are usually more rigid the interfaces of these interactions do not
have the ability to alter their conformation in order to improve binding and
ease movement Conformational changes are limited by steric constraint and
thus are said to be rigid
42
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Fig Protein-Protein docking
Protein ReceptorndashLigand docking
Protein receptor-ligand motifs fit together tightly and are often referred to as
a lock and key mechanism There is both high specificity and induced fit
within these interfaces with specificity increasing with rigidity Protein
receptor-ligand can either have a rigid ligand and a flexible receptor or a
flexible ligand with a rigid receptor
FigProtein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the
interface area between the molecules They move within respect to one
another in a perpendicular direction in respect to the interface This allows
for binding of a receptor with a larger than usual ligand Normally when
there is ligand overlap in the docking interface energy penalties incur If the
van der Waals forces can be decreased energy loss in the system will be
43
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
minimilized This can be accomplished by allowing flexibility in the
receptor Flexibility receptors allow for docking of a larger ligand than
would be allowed for with a rigid receptor
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced
the receptor can retain its rigidity while maintaing the free energy of the
system For successful docking the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the
receptor interface No docking is completely rigid though there is intrinsic
movement which allows for small conformational adaptation for ligand
binding When the six degrees of freedom for protein movement are taken
into consideration (three rotational three translational) the amount of
inherent flexibility allowed the receptor is even greater This further offsets
any energy penalty between the receptor and ligand allowing for easier
more enegetically favorable binding between the two
Aim of docking
The aim of docking is to find out the new drugs target it will open new
vistas for further drug development The finding of our docking will be
useful in finding a cure for the infectious disease bird flu also it will open
new avenues for finding other possible drug targets in influenza A virus The
docking results can be used to design new lead compounds and hence can
aid in the new drug discovery process
44
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Receptor
A residue on the surface of the cell that serves as a recognition or binding
site for antigensantibody or other cellular or immunological componentsIt
is a molecule with in a cell suface to which a substance (such as harmones or
a drug )selectively bind causing a change in the activity of the cell
Ligand
The molecule which binds to a protein molecule (eg receptor) As a ligand
binds through the interaction of many weak noncovalent bonds formed to
the binding site of a protein the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein
Active Site
The active site of a proteinenzyme is the region that binds the substrates
(and the cofactor if any) It also contains the residues that directly
participate in the making and breaking of bonds These residues are called
the catalytic groups In essence the interaction of the enzyme and substrate
at the active site promotes the formation of the transition state The active
site is the region of the enzyme that most directly lowers the Delta G of the
reaction which results in the rate enhancement characteristic of enzyme
action
Amino acids in protein active sites
It is difficult to generalize which amino acids are likely to be in a protein
activefunctional site as this greatly depends on the type of function With
that in mind below are preferences for the 20 amino acids to lie within
45
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
functional regions on proteins These were worked out by considering how
often particular amino acids were in contact with bound non-protein atoms
in protein three-dimensional structures Postive values mean that the amino
acid makes more contacts than one would expect by chance negative values
mean that it makes fewer The below does not include protein-protein or
protein-peptide interactions where many of the amino acids with negative
values (eg tryptophan or proline) can play critical roles
His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210
Ile -0005 Ala 0025 Glu 0050 Arg 0055
Pro -0200Lys 0100 Thr 0100 Ser 0130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a
Ramachandran diagram ) developed by Gopalasamudram Narayana
Ramachandran is a way to visualize dihedral angles phi against (sai ) of
amino acid residues in protein structure It shows the possible conformation
of phi 1048576 and shi angles for a polypeptide In a polypeptide the main
chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is
drawn between torsion angles phi and psi Ramachandran used computer
models of small polypeptides to systematically vary and with the objective
of finding stable conformations For each conformation the structure was
examined for close contacts between atoms Atoms were treated as hard
spheres with
46
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
dimensions corresponding to their Vander Waals radii And the angles
which cause spheres to collide correspond to sterically disallowed
conformations of the polypeptide backbone
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing
how correct they are Depending on how many programs one select to use
the server can take several minutes to run It also depends on how many
residues there are in the protein that is submitted
PROCHECK
The aim of PROCHECK is to assess how normal or conversely how
unusual the
geometry of the residues in a given protein structure is as compared with
stereo chemical parameters derived from well-refined high resolution
structure The checks also make use of lsquoidealrsquo bond lengths and bond angles
as derived from a recent and comprehensive analysis of small molecule
structures in the Cambridge Structural Database (CSD)
INPUT
The input to PROCHECK is a single file containing the coordinates of the
protein structure One of the by-products of running PROCHECK is that
coordinate file will be ldquocleaned uprdquo by the first of the programs The
cleaning up process corrects any mislabelled atoms and creates a new
coordinates file which has a filendashextension of new new file will have the
atoms labelled in accordance with the IUPAC naming convention
47
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
OUTPUT
The output comprises of the plots together with detailed residue-by-residue
listing It generates number of output files in the default directory which
have the same name as the original PDB file but with different extensions
The residue-by residue listing has a out extension and lists all the computed
stereo chemical properties by residue in a printable ASCII text file
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (ie bonds
angels and dihedrals) Energy minimization can repair distorted geometries
by moving atoms release internal constraints Energy minimization is good
to release local constraints for a residue but it will not pass through high
energy barriers and stop in a local minima
The potential energy calculated by summing the energies of various
interactions is a numerical value for a single conformation This number can
be used to evaluate a particular conformation but it may not be a useful
measure of a conformation because it can be dominated by a few bad
interactions For instance a large molecule with an excellent conformation
fro nearly all atoms can have a large overall energy because of a single bad
interactions for instance two atoms too near each other space and having a
huge Vander walls repulsion energy It is often preferable to carry out
energy minimization on a conformation to find the best nearby
conformation Energy minimization is usually performed by gradient
48
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
optimization atoms are moved so as to reduce the net forces on them The
minimized structure has small forces on each atom and therefore serves as
an excellent starting point for molecular dynamics simulations
49
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Result and discussion
1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information Entry name GLCTK_HUMANPrimary accession number
Q8IVS8
Name and origin of the protein
Protein name Glycerate kinaseSynonyms EC 27131
HBeAg-binding protein 4 Gene name Name GLYCTK
Synonyms HBEBP4ORFNames LP5910
FromHomo sapiens (Human)
[TaxID 9606]
Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo
Protein existence2 Evidence at transcript level
Blat result-
List of potentially matching sequences-
50
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50
pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60
pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60
Graphical overview of the alignments
51
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Primary structure prediction
By ProtParam
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)
Number of amino acids 523
Molecular weight 552526
Theoretical pI 625
Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19
52
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00
Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43
Atomic composition
Carbon C 2435Hydrogen H 3967Nitrogen N 711
Oxygen O 719Sulfur S 17
Formula C2435H3967N711O719S17
Total number of atoms 7849
Extinction coefficients
Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water
Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines
53
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines
Secondary structure prediction
By SOPMA result for UNK_158250
View SOPMA in
10 20 30 40 50 60 70 | | | | | | |
Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000
54
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Parameters Window width 17 Similarity threshold 8 Number of states 4
Multiple sequence alignment
ClustalW2 Results
1 Number of sequences
10
2 Alignment score 28565
3 Sequence format Pearson
4 Sequence type Aa
5 Output file clustalw2-20080510-09552541output
6 Alignment file clustalw2-20080510-09552541aln
7 Guide tree file clustalw2-20080510-09552541dnd
8 Your input file clustalw2-20080510-09552541input
55
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Scores Table
SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65 2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65
56
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98
57
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98 6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97 7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================
Alignment
CLUSTAL 205 multiple sequence alignment
P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------
58
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
59
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
60
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
61
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
62
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
63
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
P17099|HBEAG_HBVA4 -------------------------------------------
64
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)
65
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
000175P0C625|HBEAG_HBVA3000818)
000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)
PhylogramTertiary structure prediction
pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB
Swiss-PdbViewer was launched and the following procedure was carried out
Steps involved in SPDBV
open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo
choose icon -fit-fit raw sequence then magic fit then iterative fit
choose icon -file - save-layer(pdb)
choose icon -file - save-project(pdb)
choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)
66
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
open the new structure (received from Email) - remove the template-by selecting the target
choose icon -file - save-layer(pdb)
Open Swiss model and select load raw sequence option to load target molecule
67
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Perform magic fit iterative fit provided under FIT in order to fit the two sequences
Save the file as the project
68
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling
Homologous modeling
Optimise Mode Request submission form
Please fill these fields
Your Email address
Lakshay1202gmailcom (MUST be correct)
Your Name Lakshay
Request title Lakshay projectWill be added to the results header
Your SWISS-MODEL project file can be found in
69
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
CDocuments and SettingsuserDesktopproj_kumarpdb
Workunit P000044 TitleQ8IVS8
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (253 Aring)
Sequence Identity [] 34
Evalue 270e-52
click on model bars
70
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Fig structure of template after modeling
Model Validation
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS
71
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
SAVES results for proj_gunjanpdb
Procheck summary
RAMCHANDRAN POLT
72
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8
73
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Number of glycine residues (shown as triangles) 127 (59)
Number of proline residues 60 ----Total number of residues 1351
Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions
Docking result ---- by Hex software
FigLigand amp Receptor (2B8N)
74
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Fig after docking
DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059
Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file
75
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025
76
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
77
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005
LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005
78
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE
79
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file
Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR
80
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001
MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
81
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050
82
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052
LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053
83
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021
84
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
THRH Radius = 000 Charge = 025
Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A
Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)
Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds
85
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds
Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds
Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds
Docking will output a maximum of 500 solutions per pair
------------------------------------------------------------------------------Docking 1 pair of starting orientations
Docking receptor 2B8N and ligand 2B8N
Receptor 2B8N Tag = 2B8N
Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)
Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds
Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000
86
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1
3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)
87
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds
Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5
Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)
Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search
Docked structures 2B8N2B8N in a total of 7 min 5 sec
Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000
Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------Saving top 500 orientations
Docking done in a total of 8 min 11 sec
------------------------------------------------------------------------------
No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds
88
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100
89
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Conclusion
90
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
After analyzing protein sequence of Hepatitis B virus we come to
conclusion that though they all are closely related they have an important
role in survival in different species It is interesting to have closer look at the
matter by studying at the gene level A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
We have noticed that same genes are present in all strains this shows that
are they evolved together
With the finishing of the ongoing gene sequencing
project on HBV we hope it will be possible to draw conclusive decision
about the true picture of evolution in near future and gene responsible for
pathogenesis can also be identified
Complete inference can only be drawn based on a
comprehensive list of the gene products and their function
In order to find out unknown structure of protein
present in the different species we do homology modelling We forward
step to present a theoretical model using available online modelling tools
As we study that HBeAG (Glycerate kinase )
protein that is coded by gene is one of the second reasons of pathogenicity
of HBV So we tried to dock this protein with appropriate ligand in order to
inhibit their activity on the basis of which the drugs have to be developed
91
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Future prospects
92
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
The work presented in this report might just be a stepping stone for any such
discoveries The present work might be small finding of big issue
Phylogenetics is that field of biology which deals with identifying and
understanding the relationships between the many different kinds of life on
earth This includes methods for collecting and analyzing data as well as
interpretation of those results as new biological information
The purpose of modeling is to help the Drug developers and
Biotechnologists to develop the drug more efficiently and with more
effectiveness in future by analyzing the modeled structure of protein
As the new drugs target would be identified it will open new vistas for
further drug development The finding of our docking will be useful in
finding a cure for the infectious disease bird flu also it will open new
avenues for finding other possible drug targets in influenza A virus
The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process
Finally similar process can be applied on other pathogens and hence
possible therapeutic sites can be identified in them Similar method can also
be applied to other infectious diseases and hence we can look forward to a
better disease free world
The work presented is just a small part of big issue and lots of work still
needs to be done to establish a good phylogenetic relationship and full
93
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
fledged cure for bird flu But we are hoping that these findings will go long
way and will prove fruitful to any going in a similar area
94
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
BIBLIOGRAPHY
95
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses
[2] - F V Chisari C Ferrari
Department of Molecular and Experimental Medicine Scripps Research
Institute La Jolla California 92037 USA
[3] -C Seeger W S Mason
Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA
c_seegerfcccedu
[4]- plumbed
[5]- Howard Hughes Medical Institute Department of Biochemistry and
Molecular Biophysics Columbia University New York New York 10032
USA
Reprint requests to Barry Honig Howard Hughes Medical Institute
Department of Biochemistry and Molecular Biophysics Columbia
University New York NY 10032 USA
[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining
multiple structure and sequence alignments to improve sequence detection
and alignment Application to the SH2 domains of Janus kinases Proc Natl
Acad Sci 98 14796ndash14801 [PubMed]
96
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated
structure-based prediction of functional sites in proteins Applications to
assessing the validity of inheriting protein function from homology in
genome annotation and to protein docking J Mol Biol 311 395ndash408
[PubMed]
Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller
W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new
generation of protein database search programs Nucleic Acids Res 25
3389ndash3402 [PubMed]
Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas
M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro
mdashAn integrated documentation resource for protein families domains and
functional sites Bioinformatics
[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics
Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra
Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)
Spain
[8]- Computational Sciences Department of Chemistry Nerviano Medical
Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy
romanokroemersanofi-aventiscom
97
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
Abbreviation
98
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99
CSA Catalytic Site Atlas
Emboss European Molecular Biology Open Software Suit
NCBI National Centre for Biotechnology Information
NDB Nucleic Acid Database
ORF Open Reading Frame
OTU Operational Taxonomic Unit
PDB Protein Data Bank
Phylip Phylogeny Inference Package
99