HEPETIETIES VIRUS

Introduction

Bioinformatics is an interdisciplinary research area at the interface between

computer science and biological science It involves the technology that uses

computers for storage retrieval manipulation and distribution of

information related to biological macromolecules such as DNA RNA and

proteins Bioinformatics is limited to sequence structural and functional

analysis of genes and genomes and their corresponding products and is often

considered computational molecular biology It consists of two subfields the

development of computational tools and databases and the application of

these tools and databases in generating biological knowledge to better

understand living systems These tools are used in three areas of genomic

and molecular biological research molecular sequence analysis molecular

structural analysis and molecular functional analysis The areas of sequence

analysis include sequence alignment sequence database searching motif

and pattern discovery gene and promoter finding reconstruction of

evolutionary relationships and genome assembly and comparison

Structural analyses include protein and nucleic acid structure analysis

comparison Classification and prediction The functional analysis includes

gene expression profiling protein- protein interaction prediction protein sub

cellular localization prediction metabolic pathway reconstruction and

simulation The three aspects of bioinformatics analysis are not isolated but

often interact to produce integrated results For example protein structure

prediction depends on sequence alignment data clustering of gene

expression profiles requires the use of phylogenetic tree construction

methods derived In sequence analysis Sequence- based prediction is related

functional analysis of co expressed genes The first major bioinformatics

project was undertaken by Margaret Day off in 1965 who developed a first

protein sequence database called Atlas of Protein Sequence and Structure

Subsequently in the early 1970s the Brookhaven national laboratory

established the Protein Data Bank for archiving three-dimensional protein

structures At its onset the database stored less than a dozen protein

structures compared to more than 30000 structures today The first

sequence alignment algorithm was developed by Needleman and Wunsch in

1970 This was a fundamental step in the development of the field of

bioinformatics which paved the way for the routine sequence comparisons

and database searching practiced by modern biologists

10 The recent advance of Bioinformatics is molecular modeling which is

aimed at understanding structure-function and structure property relationship

in physic-chemical processes and pharmaceuticals amp thus has become

increasingly important for finding and designing new drugs In fact

computers are playing an important role in new drug discovery and drug

design

HEPATITIS-

Hepatitis (plural hepatitides) implies injury

to liver characterized by presence of inflammatory cells in the

liver tissue Etymologically from ancient Greek hepar or hepato- meaning

liver and suffix -itis denoting inflammationrsquo The condition can be self

limiting healing on its own or can progress to scarring of the liver

Hepatitis is acute when it lasts less than 6 months

and chronic when it persists longer A group of

viruses known as the hepatitis viruses cause most cases of

liver damage worldwide Hepatitis can also be due to toxins

(notably alcohol) other infections or

from autoimmune process

It may run a sub

clinical course when the affected person may not feel ill

The patient becomes unwell and symptomatic when the

disease impairs liver functions that include among other

things screening of harmful substances regulation of blood

composition and production of bile to help digestion

Causes

Acute hepatitis

Viral Hepatitis Hepatitis A to E (more than 95 of viral

cause) Herpes simplex Cytomegalovirus Epstein-

Barr Yellow fever virus Adenoviruses

Non viral infection Toxoplasma Leptospira Q

fever Rocky mountain spotted fever

Alcohol

Toxins Amanita toxin in mushrooms Carbon

tetrachloride Asafetida

Drugs Paracetamol Amoxicillin Antituberculosis

medicines Minocycline and many others

Ischemic hepatitis (circulatory insufficiency)(1)

Pregnancy

Auto immune conditions eg Systemic Lupus

Erythematosus (SLE)

Metabolic diseases eg Wilsons disease

Chronic hepatitis

Viral hepatitis Hepatitis B with or without hepatitis D

hepatitis C (Hepatitis A and E do not lead to chronic

disease)

Autoimmune Autoimmune hepatitis

Alcohol

Drugs Methyl-dopa NitrofurantoinIisoniazide Ketocon

Non-alcoholic steatohepatitis

Heredity Wilsons disease alpha 1-antitrypsin

deficiency

Primary biliary cirrhosis and primary sclerosing

cholangitis occasionally mimic chronic hepatitis

Viral hepatitis

A virus is a particle which is smaller than bacteria and contains complex

genetic information called DNA or RNA This genetic material allows the

virus to infect bacteria or living cells set up the machinery to reproduce

itself leading to destruction of the cell in which it resides To date five

viruses labeled A through E have been identified which appear to cause

viral hepatitis Viruses A and E can be contracted from contaminated water

or food (by mouth) while viruses B C and D are transmitted by direct

injection into the bloodstream (through any method of injection under the

skin) The term viral hepatitis describes any one of the illnesses caused by

the five viruses mentioned and consists of an infection of liver cells which

leads to damage of the liver over days in some cases but over many years in

others Thirty years ago none of the hepatitis viruses had been identified In

the 1960s transfusion-related viral hepatitis was extremely common with

30 of patients receiving blood products becoming infected By 1970 a

blood test called the Australia antigen was developed which appeared to

identify those infected with one hepatitis virus which we now call hepatitis

B The investigator who discovered the Australia antigen the protein which

makes up the coat of the virus and which is now called the hepatitis B

surface antigen (HBsAg) was awarded the Nobel prize Our understanding

of viral hepatitis has grown tremendously since the discovery of the

Australia antigen

Currently 11 viruses are recognized as causing hepatitis

Two are herpes viruses (cytomegalovirus virus [CMV] and Epstein- Barr

virus[EBV]) and 9 are hepatotropic viruses EBV and CMV cause mild self-

resolving forms of hepatitis with no permanent hepatic damage Both viruses

causes the typical infectious mononucleosis of fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well

characterized hepatitis G and TTV(transfusion transmitted virus) are newly

discovered viruses hepatitis A (sometimes called infectious hepatitis) and

hepatic E (formally called enteric ndashtransmitted NANB hepatitis) are

transmitted by fecal-oral contamination The most important type include

hepatitis B(sometime called serum hepatitis) hepatitis C (formally called

formally non-A non-B hepatic) and hepatitis D (formally called delta

hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common

especially in children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the

disease Fulminant hepatitis is rare 01 of cases Virus enters via the gut

replicates in the alimentary tract and spreads to infect the liver where it

multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks

preceding the onset of symptoms

World-wide distribution endemic in most countries The incidence in first

world countries is declining There is an especially high incidence in

developing countries and rural areas In rural areas of South Africa the

seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in

pregnant women Mortality rate is high (up to 40)Similar to hepatitis A

virus replicates in the gut initially before invading the liver and virus is

shed in the stool prior to the onset of symptoms Viraemia is transient A

large inoculum of virus is needed to establish infectionLittle is known yet

The incidence of infection appears to be low in first world countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period

6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

Defective virus which requires Hepatitis B as a helper virus in order to

replicate Infection therefore only occurs in patients who are already

infected with Hepatitis BIncreased severity of liver disease in Hepatitis B

carriers virus particle 36 nm in diameter encapsulated with HBsAg derived

from HBV delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B

non-C hepatitis has been called Hepatitis G virus It was implicated as a

cause of parenterally transmitted hepatitis but is no longer believed to be a

major agent of liver disease It has been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of

infecting human liver cells and other cells in the body once it gains access

to the blood stream One of the most interesting features of the hepatitis B

virus is that the virus itself does not damage the liver the damage being

caused by the individuals own immune system attacking the virus-infected

cells Since liver damage from the virus may be very little many patients are

called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver

function tests While many individuals remain healthy for many years or a

lifetime others develop chronic hepatitis cirrhosis and occasionally liver

cell cancer These outcomes are linked to the virus and its effects although it

is unlikely that the virus directly causes cancer Those patients who develop

hepatitis (damage to liver cells with inflammation) do so on account of the

bodys normal inclination to attack the foreign proteins contained in viruses

and in the cells in which the viruses are found This process called the

immune response determines the pace and the severity of the liver cell

injury in this condition and will be described in more detail below

Since the identification of the hepatitis B virus several other viruses which

are nearly identical have been identified in Eastern woodchucks ground

squirrels and Peking ducks The members of this virus family termed the

Hepadna viruses have similar life cycles to that observed in man and can

serve as animal models allowing further study of these unique disease-

causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of

human ) Avihepadnavirus (eg Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular

dsDNA genome

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as spheres amp tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than

Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and

virus particles as well as excess viral surface protein are shed in large

amounts into the blood Viraemia is prolonged and the blood of infected

individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to

eliminate the virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

The virus persists in the hepatocytes and on-going liver damage occurs

because of the host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver

damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and

rapid progression to cirrhosis or liver failure Patients who become

persistently infected are at risk of developing hepatocellular carcinoma

HBV is thought to play a role in the development of this malignancy

because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50

million of which are in Africa Carriage rates vary markedly in different

areas In South Africa infection is much more common in rural communities

than in the cities Hepatitis B is parenterally transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority

of individuals become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental

4) Vertical transmission - perinatal transmission from a carrier mother to

her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm

spheres and tubules Its presence in serum indicates that virus replication is

occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the

blood Its presence in serum indicates that a high level of viral replication is

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence

and indicates immunity following infection It remains detectable for life and

is not found in chronic carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It

indicates low infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both

chronic carriers as well as those who clear the infection Its presence

indicates exposure to HBVof the chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of

HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three

doses induces protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants

receive 3 doses at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune

individuals following single episode exposure to HBV-infected blood For

example needle stick injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are

not aware of the infection for several weeks until they develop symptoms of

acute hepatitis such as nausea fatigue and jaundice (yellowing of the eyes)

The acute hepatitis phase may last for several weeks and occasionally leads

to hospitalization but acute hepatitis B resolves completely in 95 of those

infected

Others who do not develop significant symptoms

following exposure may not be aware of the infection These individuals

may also overcome the infection completely and develop immunity but

frequently become chronic carriers

The outcome of hepatitis B infection depends to a great

extent on the status of the persons immune system at the time of exposure

Most chronic carriers or those with chronic hepatitis B are not aware of their

on-going infection although some have persistent fatigue

Molecular virology

Genome circular and 32kb in size double stranded It has compact

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction

and no noncoding regions The minus strand is unit length and has a protein

covalently attached to the 5 end The other strand the plus strand is

variable in length but has less than unit length and has an

RNA oligonulceotide at its 5 end Thus neither DNA strand is closed and

circularity is maintained by cohesive ends (Strauss 2002) The four

overlapping open reading frames (ORFs) in the genome are responsible for

the transcription and expression of seven different hepatitis B proteins The

transcription and translation of these proteins is through the used of multiple

in-frame start codons The HBV genome also contains parts that regulate

transcription determine the site of polyadenylation and a specific transcript

for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is

capable of supporting its replication Although hepatocytes are known to be the most

effective cell type for replicating HBV other types of cells in the human body have

be found to be able to support replication to a lesser degree

The initial steps following HBV entry are not clearly defined

although it is known that the virion initially attaches to a susceptible hepatocyte

through recognition of cell surface receptor that has yet to be indified (Garces

HBVP) The DNA is then enters into the nucleus where it is known to form a

convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for

transcription by RNA polII of a longer than genome length RNA called the

pregenome and shorter subgenomic transcripts all of which serve as mRNAs The

shorter viral mRNAs are translated by ribosomes attached to the cells endoplasmic

reticulum and the proteins that are destined to become HBV surface antigens in the

viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P

which then binds to a specific site at the 3 end of its own transcript where viral

DNA synthesis eventually occurs Occuring at the same time as capsid formation

the RNA-P protein complex is packaged and reverse transcription begins

At early times after the infection the DNA is recirculated

to the nucleus where the process is repeated resulting in the the accumulation of 10

to 30 molecules of CCC DNA and an increase in viral mRNA concentrations

(Flint etal 765 )

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious

particle found within the body of an infected patient This virion has a

diameter of 42nm and its outer envelope contains a high quantity of hepatitis

b surface proteins The envelope surrounds the inner nucleocapsid which is

made up of 180 hepatitis B core proteins arranged in an icosahedral

arrangement The nucleocapsid also contains at least one hepatitis b

ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-

derived particles Large numbers of smaller subviral particles are also

presentthat usually outnumber the virions in the ratio of 1001These two

subviral particles the hepatitis B filament and a hepatitis B sphereare often

referred to as a group named surface antigen particlesThe sphere contains

both middle and small surface proteins whereas the filament also includes

large hepatitis B surface protein lso includes large hepatitis B surface

protein The absence of the hepatitis B core polymerase and genome causes

these particles to have a non-infectious nature High levels of these non-

infectious particles can be found during the acute phase of the infection

Since the non-infectious particles present the same sites as the virion they

induce a significant immune response and are thought to be non-

advantagous for the virus However it is also believed that the presence of

high levels of non-infectious particles may allow the infectious viral

particles to travel undetected by antibodies through the blood stream

(Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV

genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of

hepatitis B surface antigens small hepatitis B surface antigen (HBsAg or

SHBsAg) middle hepatitis B surface antigen (MHBsAg) and large hepatitis

B surface Antigen (LHBsAg) HBsAg is the smallest protein of the hepatitis

B surface proteins and has historically been known as the Australia antigen

(Au antigen) It is very hydrophobic containing four-transmembrane

spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high

quantities It also contains a highly antigenic epitope which may be

responsible for triggering immune response Regardless of the high

Antigenicity and prevalence of these particlesthe immune system appears

basically oblivious to their presence

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be

detected directly by blood test this antigen can only be isolated by analyzing

an infected hepatocyte A 185 amino acid protein is expressed in the

cytoplasm of infected cells they are highly associated with nucleocapsid

assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early

appearance during an acute HBV infection Thought to be located in the core

structure of the virus molecule this antigen can be detected by blood test If

found its usually indicative of complete virus particles in circulation

(Strauss 2002)

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B

virus (HBV) that causes a necroinflammatory liver disease of variable

duration and severity Chronically infected patients with active liver disease

carry a high risk of developing cirrhosis and hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA

virus of Complex structure HBV is classified as orthohepadnavirus within

the family Headnaviridae Serum of individuals infected with hepatitis B

contains 3 distinct antigen particle a spherical 22 nm particle a 42 nm

(containing DNA and DNA polymerase) called Dane particle and tubular or

filamentous that vary in length These are infective form of virusThe

hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion

The immune response to HBV-encoded antigens is responsible both for viral

clearance and for disease pathogenesis during this infection While the

humoral antibody response to viral envelope antigens contributes to the

clearance of circulating virus particles the cellular immune response to the

envelope nucleocapsid and polymerase antigens eliminates infected cells

The dominant cause of viral persistence during HBV infection is the

development of a weak antiviral immune response to the viral antigens

While neonatal tolerance probably plays an important role in viral

persistence in patients infected at birth the basis for poor responsiveness in

adult-onset infection is not well understood and requires further analysis

Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an

ineffective immune response as can the incomplete down regulation of viral

gene expression and the infection of immunologically privileged tissues

Chronic liver cell injury and the attendant inflammatory and regenerative

responses create the mutagenic and mutagenic stimuli for the development

of DNA damage that can cause hepatocellular carcinoma Elucidation of the

immunological and virological basis for HBV persistence may yield

immunotherapeutic and antiviral strategies to terminate chronic HBV

infection and reduce the risk of its life-threatening sequellae

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections

of the liver Transient infections run a course of several months and chronic

infections are often lifelong Chronic infections can lead to liver failure with

cirrhosis and hepatocellular carcinoma The replication strategy of these

viruses has been described in great detail but virus-host interactions leading

to acute and chronic disease are still poorly understood Studies on how the

virus evades the immune response to cause prolonged transient infections

with high-titer viremia and lifelong infections with an ongoing inflammation

of the liver are still at an early stage and the role of the virus in liver cancer

is still elusive The state of knowledge in this very active field is therefore

reviewed with an emphasis on past accomplishments as well as goals for the

future

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm

Antibody

is not found in chronic carriers

indicates exposure to HBV of the chronic carrier

Homology or comparative modeling involves the prediction of the structure

of a query sequence from the structures of one or more structural templates

The procedure involves the identification of possible templates that have a

clear sequence relationship to the query the assembly of the model the

prediction of regions of the structure that are likely to have different

conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences

between the template and query structures As mentioned above homology

modeling figures heavily as a rationale for structural genomics initiatives

under the stated assumption that accurate models can be built for query

sequences that have a greater than 30 sequence identity with their best

template

The quality of the alignment of the query to the template sequence is a major

factor in determining the quality of homology models This is one of the

sources of the 30 rule because alignment quality usually decreases

dramatically below about 30 sequence identity (A structural explanation

for this observation has been offered by Chung and Subbiah 1996)

Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing

improvements in the quality of homology models

With the number of protein-ligand complexes available in the Protein Data

Bank constantly growing structure-based approaches to drug design and

screening have become increasingly important Alongside this explosion of

structural information a number of molecular docking methods have been

developed over the last years with the aim of maximally exploiting all

available structural and chemical information that can be derived from

proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that

incorporate some degree of chemical information to actively guide the

orientation of the ligand into the binding site To reflect the focus on the use

of chemical information a classification scheme for guided docking

approaches is proposed In general terms guided docking approaches can be

divided into indirect and direct approaches Indirect approaches incorporate

chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches

incorporate chemical information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further

divided into protein-based mapping-based and ligand-based approaches to

reflect the source used to derive the features capturing the chemical

information inside the protein cavity Within each category a representative

list of docking approaches is discussed In view of the limitations of current

scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for

improving binding affinity estimations ligand binding-mode predictions

and virtual screening enrichments obtained from protein-ligand docking

This review gives an introduction into ligand - receptor docking and

illustrates the basic underlying concepts An overview of different

approaches and algorithms is provided Although the application of docking

and scoring has led to some remarkable successes there are still some major

challenges ahead which are outlined here as well Approaches to address

some of these challenges and the latest developments in the area are

presented Some aspects of the assessment of docking program performance

are discussed A number of successful applications of structure-based virtual

screening are described

Material and methods

proteins

Bioinformatics is limited to sequence structural and functional analysis of

genes and genomes and their corresponding products and is often considered

computational molecular biology It consists of two subfields the

structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology

information NCBI creates public databases conducts research in

computational biology develops software tools for analyzing genome

data and disseminates biomedical information - all for the better

understanding of molecular processes affecting human health and disease

Swiss-prot-

a curated protein sequence database which strives to

provide a high level of annotation (such as the description

of the function of a protein its domains structure post-

translational modifications variants etc) a minimal level

of redundancy and high level of integration with other

databases

2 Protein sequence- of Glycerate kinase (HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc

3 FASTA

FASTA is a DNA and Protein sequence alignment software package first

described (as FASTP) by David J Lipman and William R Pearson in 1985

in the article Rapid and sensitive protein similarity searches The original

FASTP program was designed for protein sequence similarity searching

FASTA described in 1988 (Improved Tools for Biological Sequence

Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling

program for evaluating statistical significance There are several programs in

this package that allow the alignment of protein sequences and DNA

sequences FASTA is pronounced FAST-Aye and stands for FAST-All

because it works with any alphabet an extension of FAST-P (protein) and

FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein

DNADNA proteintranslated DNA (with frameshifts) and ordered or

unordered peptide searches Recent versions of the FASTA package include

special translated search algorithms that correctly handle frameshift errors

(which six-frame-translated searches do not handle very well) when

comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides

SSEARCH an implementation of the optimal Smith-Waterman algorithm A

major focus of the package is the calculation of accurate similarity statistics

so that biologists can judge whether an alignment is likely to have occurred

by chance or whether it can be used to infer homology The FASTA

package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is

an algorithm for comparing primary biological sequence information such

as the amino-acid sequences of different proteins or the nucleotides of DNA

sequences A BLAST search enables a researcher to compare a query

sequence with a library or database of sequences and identify library

sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using Prot Param - for primary structure ProtParam computes various physico-chemical properties that can be

deduced from a protein sequence No additional information is required

about the protein under consideration The protein can either be specified as

a Swiss-ProtTrEMBL accession number or ID or in form of a raw

sequence White space and numbers are ignored If you provide the

accession number of a Swiss-ProtTrEMBL entry you will be prompted

with an intermediary page that allows you to select the portion of the

sequence on which you would like to perform the analysis The choice

includes a selection of mature chains or peptides and domains from the

Swiss-Prot feature table (which can be chosen by clicking on the positions)

as well as the possibility to enter start and end position in two boxes By

default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

Using SOPMA for secondary structure analysis

Recently a new method called the self-optimized prediction method (SOPM)

has been described to improve the success rate in the prediction of the

secondary structure of proteins In this paper we report improvements

brought about by predicting all the sequences of a set of aligned proteins

belonging to the same family This improved SOPM method (SOPMA)

correctly predicts 695 of amino acids for a three-state description of the

secondary structure ( -helix szlig-sheet and coil) in a whole database containing

126 chains of non-homologous (less than 25 identity) proteins Joint

prediction with SOPMA and a neural networks method (PHD) correctly

predicts 822 of residues for 74 of co-predicted amino acids Predictions

are available by Email to deleageibcpfr or on a Web page

(httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

conserved which may in turn lead to experiments to test those hypotheses

For example the spatial arrangement of conserved residues may suggest

whether a particular residue is conserved to stabilize the folding to

participate in binding some small molecule or to foster association with

another protein or nucleic acid

Figure First the known template 3D structures are aligned with the

target sequence to be modelled Second spatial features such as CZ -

CZ distances hydrogen bonds and main chain and side chain dihedral

angles are transferred from the templates to the target Thus a number

of spatial restraints on its structure are obtained Third the 3D model is

obtained by satisfying all the restraints as well as possible

Homology modeling can produce high-quality structural models when the

target and template are closely related which has inspired the formation of a

structural genomics consortium dedicated to the production of representative

experimental structures for all classes of protein folds The chief

inaccuracies in homology modeling which worsen with lower sequence

identity derive from errors in the initial sequence alignment and from

improper template selection Like other methods of structure prediction

current practice in homology modeling is assessed in a biannual large-scale

experiment known as the Critical Assessment of Techniques for Protein

Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best

template structure if indeed any are available The simplest method of

template identification relies on serial pairwise sequence alignments aided

by database search techniques such as FASTA and BLAST More sensitive

methods based on multiple sequence alignment - of which PSI-BLAST is

the most common example - iteratively update their position-specific scoring

matrix to successively idenfity more distantly related homologs This family

of methods has been shown to produce a larger number of potential

templates and to identify better templates for sequences that have only

distant relationships to any solved structure Protein threading also known

as fold recognition or 3D-1D alignment can also be used as a search

technique for identifying templates to be used in traditional homology

modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are

considered sufficiently close in evolution to make a reliable homology

model Other factors may tip the balance in marginal cases for example the

template may have a function similar to that of the query sequence or it may

belong to a homologous operon However a template with a poor E-value

should generally not be chosen even if it is the only one available since it

may well have a wrong structure leading to the production of a misguided

model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve

upon individual fold-recognition servers by identifying similarities

(consensus) among independent predictions

Often several candidate template structures are identified by these

approaches Although some methods can generate hybrid models from

multiple templates most methods rely on a single template Therefore

choosing the best template from among the candidates is a key step and can

affect the final accuracy of the structure significantly This choice is guided

by several factors such as the similarity of the query and template

sequences of their functions and of the predicted query and observed

template secondary structures Perhaps most importantly the coverage of the

aligned regions the fraction of the query sequence structure that can be

predicted from the template and the plausibility of the resulting model

Thus sometimes several homology models are produced for a single query

sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search

technique as the basis for the subsequent model production however more

sophisticated approaches have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modelling studies aiming at finding a proper

fit between a ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in

size The interface between the two molecules tend to be flatter and

smoother than those in protein-ligand interactions Protein-protein

interactions are usually more rigid the interfaces of these interactions do not

have the ability to alter their conformation in order to improve binding and

ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as

a lock and key mechanism There is both high specificity and induced fit

within these interfaces with specificity increasing with rigidity Protein

receptor-ligand can either have a rigid ligand and a flexible receptor or a

flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the

interface area between the molecules They move within respect to one

another in a perpendicular direction in respect to the interface This allows

for binding of a receptor with a larger than usual ligand Normally when

there is ligand overlap in the docking interface energy penalties incur If the

van der Waals forces can be decreased energy loss in the system will be

minimilized This can be accomplished by allowing flexibility in the

receptor Flexibility receptors allow for docking of a larger ligand than

would be allowed for with a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced

the receptor can retain its rigidity while maintaing the free energy of the

system For successful docking the parameters of the ligand need to be

maintained and the ligand must be slightly smaller in size than that of the

receptor interface No docking is completely rigid though there is intrinsic

movement which allows for small conformational adaptation for ligand

binding When the six degrees of freedom for protein movement are taken

into consideration (three rotational three translational) the amount of

inherent flexibility allowed the receptor is even greater This further offsets

any energy penalty between the receptor and ligand allowing for easier

more enegetically favorable binding between the two

Aim of docking

The aim of docking is to find out the new drugs target it will open new

vistas for further drug development The finding of our docking will be

useful in finding a cure for the infectious disease bird flu also it will open

new avenues for finding other possible drug targets in influenza A virus The

docking results can be used to design new lead compounds and hence can

aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding

site for antigensantibody or other cellular or immunological componentsIt

is a molecule with in a cell suface to which a substance (such as harmones or

a drug )selectively bind causing a change in the activity of the cell

Ligand

The molecule which binds to a protein molecule (eg receptor) As a ligand

binds through the interaction of many weak noncovalent bonds formed to

the binding site of a protein the tight binding of a ligand depends upon a

precise fit to the surface-exposed amino acid residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates

(and the cofactor if any) It also contains the residues that directly

participate in the making and breaking of bonds These residues are called

the catalytic groups In essence the interaction of the enzyme and substrate

at the active site promotes the formation of the transition state The active

site is the region of the enzyme that most directly lowers the Delta G of the

reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With

that in mind below are preferences for the 20 amino acids to lie within

functional regions on proteins These were worked out by considering how

often particular amino acids were in contact with bound non-protein atoms

in protein three-dimensional structures Postive values mean that the amino

acid makes more contacts than one would expect by chance negative values

mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative

values (eg tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005 Ala 0025 Glu 0050 Arg 0055

Pro -0200Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a

Ramachandran diagram ) developed by Gopalasamudram Narayana

Ramachandran is a way to visualize dihedral angles phi against (sai ) of

amino acid residues in protein structure It shows the possible conformation

of phi 1048576 and shi angles for a polypeptide In a polypeptide the main

chain N-CZ and CZ- CZ bonds relatively are free to rotate This plot is

drawn between torsion angles phi and psi Ramachandran used computer

models of small polypeptides to systematically vary and with the objective

of finding stable conformations For each conformation the structure was

examined for close contacts between atoms Atoms were treated as hard

spheres with

dimensions corresponding to their Vander Waals radii And the angles

which cause spheres to collide correspond to sterically disallowed

conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing

how correct they are Depending on how many programs one select to use

the server can take several minutes to run It also depends on how many

residues there are in the protein that is submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how

unusual the

geometry of the residues in a given protein structure is as compared with

stereo chemical parameters derived from well-refined high resolution

structure The checks also make use of lsquoidealrsquo bond lengths and bond angles

as derived from a recent and comprehensive analysis of small molecule

structures in the Cambridge Structural Database (CSD)

The input to PROCHECK is a single file containing the coordinates of the

protein structure One of the by-products of running PROCHECK is that

coordinate file will be ldquocleaned uprdquo by the first of the programs The

cleaning up process corrects any mislabelled atoms and creates a new

coordinates file which has a filendashextension of new new file will have the

atoms labelled in accordance with the IUPAC naming convention

OUTPUT

The output comprises of the plots together with detailed residue-by-residue

listing It generates number of output files in the default directory which

have the same name as the original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed

stereo chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds

angels and dihedrals) Energy minimization can repair distorted geometries

by moving atoms release internal constraints Energy minimization is good

to release local constraints for a residue but it will not pass through high

energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various

interactions is a numerical value for a single conformation This number can

be used to evaluate a particular conformation but it may not be a useful

measure of a conformation because it can be dominated by a few bad

interactions For instance a large molecule with an excellent conformation

fro nearly all atoms can have a large overall energy because of a single bad

interactions for instance two atoms too near each other space and having a

huge Vander walls repulsion energy It is often preferable to carry out

energy minimization on a conformation to find the best nearby

conformation Energy minimization is usually performed by gradient

optimization atoms are moved so as to reduce the net forces on them The

minimized structure has small forces on each atom and therefore serves as

an excellent starting point for molecular dynamics simulations

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession number

Q8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

FromHomo sapiens (Human)

[TaxID 9606]

Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence2 Evidence at transcript level

Blat result-

List of potentially matching sequences-

Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19

Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967Nitrogen N 711

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |

Sequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type Aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

8 Your input file clustalw2-20080510-09552541input

Scores Table

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------

P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120 P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240 P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117

P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165

P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360 P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284

Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------

Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

PhylogramTertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

Your Email address

Lakshay1202gmailcom (MUST be correct)

Your Name Lakshay

Request title Lakshay projectWill be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

click on model bars

Fig structure of template after modeling

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot SoftwareSAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8

Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025

Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE

MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A

Assuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYR

Warning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053

ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021

THRH Radius = 000 Charge = 025

Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N A

Found 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 seconds

Culling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)R = 000

R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)

Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

Conclusion

After analyzing protein sequence of Hepatitis B virus we come to

conclusion that though they all are closely related they have an important

role in survival in different species It is interesting to have closer look at the

matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that

are they evolved together

With the finishing of the ongoing gene sequencing

project on HBV we hope it will be possible to draw conclusive decision

about the true picture of evolution in near future and gene responsible for

pathogenesis can also be identified

Complete inference can only be drawn based on a

comprehensive list of the gene products and their function

In order to find out unknown structure of protein

present in the different species we do homology modelling We forward

step to present a theoretical model using available online modelling tools

As we study that HBeAG (Glycerate kinase )

protein that is coded by gene is one of the second reasons of pathogenicity

of HBV So we tried to dock this protein with appropriate ligand in order to

inhibit their activity on the basis of which the drugs have to be developed

Future prospects

The work presented in this report might just be a stepping stone for any such

discoveries The present work might be small finding of big issue

Phylogenetics is that field of biology which deals with identifying and

understanding the relationships between the many different kinds of life on

earth This includes methods for collecting and analyzing data as well as

interpretation of those results as new biological information

The purpose of modeling is to help the Drug developers and

Biotechnologists to develop the drug more efficiently and with more

effectiveness in future by analyzing the modeled structure of protein

As the new drugs target would be identified it will open new vistas for

further drug development The finding of our docking will be useful in

finding a cure for the infectious disease bird flu also it will open new

avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence

possible therapeutic sites can be identified in them Similar method can also

be applied to other infectious diseases and hence we can look forward to a

better disease free world

The work presented is just a small part of big issue and lots of work still

needs to be done to establish a good phylogenetic relationship and full

fledged cure for bird flu But we are hoping that these findings will go long

way and will prove fruitful to any going in a similar area

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research

Institute La Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA

c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and

Molecular Biophysics Columbia University New York New York 10032

Reprint requests to Barry Honig Howard Hughes Medical Institute

Department of Biochemistry and Molecular Biophysics Columbia

University New York NY 10032 USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining

multiple structure and sequence alignments to improve sequence detection

and alignment Application to the SH2 domains of Janus kinases Proc Natl

Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated

structure-based prediction of functional sites in proteins Applications to

assessing the validity of inheriting protein function from homology in

genome annotation and to protein docking J Mol Biol 311 395ndash408

[PubMed]

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller

W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new

generation of protein database search programs Nucleic Acids Res 25

3389ndash3402 [PubMed]

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas

M Bucher P Cerutti L Corpet F Croning MD et al 2000 InterPro

mdashAn integrated documentation resource for protein families domains and

functional sites Bioinformatics

[7]- Cheogenomics Laboratory Research Group on Biomedical Informatics

Institut Municipal Investigacioacute Medica and Universitat Pompeu Fabra

Passeig Maritim de la Barceloneta 37-49 08003 Barcelona (Catalonia)

[8]- Computational Sciences Department of Chemistry Nerviano Medical

Sciences Viale Pasteur 10 20014 Nerviano (MI) Italy

romanokroemersanofi-aventiscom

Abbreviation

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

Prevention

Using Prot Param - for primary structure

PROTOCOL FOLLOWED



By ProtParam



Scores Table

Alignment

Guide Tree

Phylogram

Procheck summary

Conclusion