89
INTRODUCTION Gene map locus 21q22.12 Carbonyl reductase (EC 1.1.1.184 ) is 1 of several monomeric, NADPH- dependent oxidoreductases having wide specificity for carbonyl compounds that are generally referred to as aldoketoreductases. Others include aldehyde reductase (EC 1.1.1.2 ; 103830 ) and aldose reductase (EC 1.1.1.21 ; 103880 ). Wermuth et al. (1988) isolated and characterized a cDNA complementary to carbonyl reductase mRNA from a human placenta cDNA library. The cDNA contained an open reading frame encoding a protein comprised of 277 amino acids with a molecular weight of 30,375. Comparison of the predicted protein sequence with the primary structures of other aldoketoreductases showed no significant homologies. A possible homology, on the other hand, was found between carbonyl reductase and 'short' subunit alcohol/polyol dehydrogenases. Carbonyl reductase catalyzes the reduction of a great variety of carbonyl compounds, e.g., quinones derived from polycyclic aromatic hydrocarbons, 9-ketoprostaglandins, and the antitumor anthracycline antibiotics daunorubicin and doxorubicin. The enzyme is widely distributed in human tissues and also occurs in other mammalian and nonmammalian species. In a carbonyl reductase cDNA cloned from a breast cancer cell line, Forrest et al. (1990) demonstrated 1,219 basepairs. Southern analysis of genomic DNA digested with several restriction enzymes and analyzed by hybridization with a labeled cDNA probe indicated that carbonyl reductase is probably coded by a single gene and does not belong to a Centre for Bioinformatics (1)

Bio Informatics

Embed Size (px)

DESCRIPTION

Protein structural analysis

Citation preview

Page 1: Bio Informatics

INTRODUCTION

Gene map locus 21q22.12

Carbonyl reductase (EC 1.1.1.184) is 1 of several monomeric, NADPH-

dependent oxidoreductases having wide specificity for carbonyl

compounds that are generally referred to as aldoketoreductases. Others

include aldehyde reductase (EC 1.1.1.2; 103830) and aldose reductase (EC

1.1.1.21; 103880). Wermuth et al. (1988) isolated and characterized a cDNA

complementary to carbonyl reductase mRNA from a human placenta cDNA

library. The cDNA contained an open reading frame encoding a protein

comprised of 277 amino acids with a molecular weight of 30,375.

Comparison of the predicted protein sequence with the primary

structures of other aldoketoreductases showed no significant

homologies. A possible homology, on the other hand, was found between

carbonyl reductase and 'short' subunit alcohol/polyol dehydrogenases.

Carbonyl reductase catalyzes the reduction of a great variety of

carbonyl compounds, e.g., quinones derived from polycyclic aromatic

hydrocarbons, 9-ketoprostaglandins, and the antitumor anthracycline

antibiotics daunorubicin and doxorubicin. The enzyme is widely

distributed in human tissues and also occurs in other mammalian and

nonmammalian species.

In a carbonyl reductase cDNA cloned from a breast cancer cell line,

Forrest et al. (1990) demonstrated 1,219 basepairs. Southern analysis of

genomic DNA digested with several restriction enzymes and analyzed by

hybridization with a labeled cDNA probe indicated that carbonyl

reductase is probably coded by a single gene and does not belong to a

family of structurally similar enzymes. Southern analysis of 17

mouse/human somatic cell hybrids showed that carbonyl reductase is

located on chromosome 21. Carbonyl reductase mRNA was induced 3- or 4-

fold in 24 hours with BHA, beta-naphthoflavone, or Sudan 1. Avramopoulos

et al. (1992) confirmed assignment to chromosome 21 by genetic linkage

Centre for Bioinformatics(1)

Page 2: Bio Informatics

mapping using a DNA polymorphism from the 3-prime untranslated region

of the CBR gene. They demonstrated, furthermore, that the gene lies

between that for interferon-alpha receptor (107450) and D21S55, being

about 3.4 and 7.2 cM, respectively, from the 2 flanking loci. The

findings placed CBR in the telomeric band 21q22.3. By high-resolution

fluorescence in situ hybridization, Lemieux et al. (1993) mapped the CBR

gene to 21q22.12, very close to the SOD1 locus at position 21q22.11.

CBR displayed gene dosage effects in trisomy 21 human lymphoblasts at

both the DNA and the mRNA levels. With increasing chromosome 21

ploidy, lymphoblasts also showed increased aldo-keto reductase

activity and increased quinone reductase activity. Both of these

activities have been shown to be associated with carbonyl reductase.

The location of CBR near SOD1 and the increased enzyme activity and

potential for free radical modulation in trisomy 21 cells implicate

CBR as a candidate for contributing to the pathology of Down syndrome.

Centre for Bioinformatics(2)

Page 3: Bio Informatics

STRUCTURE OF PROTEIN

Centre for Bioinformatics(3)

Page 4: Bio Informatics

METHODOLOGY

As information on the web is huge, there are numerous search engines to aid in formation search. Several search engines like google, Altavista, Infoseek, Hotbot etc. are widely used by the internet users.

A. DATABASE RETRIEVAL

RETRIEVAL OF INFORMATION FROM EXPASY

Type www. expasy .org in the address column Or

Go to google com and type Expasy. Select proteomic server link and click it. In Proteomic server page enter your protein in the top side box and click GO. In result page select your protein by carefully reading the specification and name. Click your

protein ID. Save your Expasy result page and note the protein ID. Hereafter you can type protein ID directly

in topside box for collecting your Expasy page. Go through the expasy result page .you can find links for all databases under the heading “Cross

references”

Model Expasy page “Cross-references”

Cross-referencesSequence databases

EMBL

Y00970; CAA68784.1; -; mRNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

X54017; CAA37964.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

X54018; CAA37964.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]

X54019; CAA37964.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]

X54020; CAA37964.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]

M77378; AAA51572.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

M77379; AAA51573.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

M77380; AAA51574.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

M77381; AAA51575.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

X66188; CAA46956.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

X54018; CAA46956.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]

X54019; CAA46956.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]

X54020; CAA46956.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]

PIR S11674; S11674.

UniGeneHs.370870

Centre for Bioinformatics(4)

Page 5: Bio Informatics

3D structure databasesHSSP P08001; 1FIZ. [HSSP ENTRY / PDB]SMR P10323; 43-301.ModBase P10323.Protein-protein interaction databasesDIP P10323.Protein family/group databasesMEROPS S01.223; -.2D gel databasesSWISS-2DPAGE

Get region on 2D PAGE.

Organism-specific gene databasesHGNC HGNC:126; ACR.GeneCards ACR.GeneLynx ACR; Homo sapiens.GenAtlas ACR.MIM 102480; gene. [NCBI / EBI]HOVERGEN [Family / Alignment / Tree]Gene expression databasesCleanEx HGNC:126; ACR.Ontologies

GO

GO:0004284; Molecular function: acrosin activity (traceable author statement).GO:0005515; Molecular function: protein binding (inferred from physical interaction).GO:0007340; Biological process: acrosome reaction (traceable author statement).QuickGo view.

Family and domain databases

InterPro

IPR012267; Pept_S1A_acrosin.IPR009003; Pept_Ser_Cys.IPR001254; Peptidase_S1_S6.IPR001314; Peptidase_S1A.Graphical view of domain structure.

PfamPF00089; Trypsin; 1.Pfam graphical view of domain structure.

PIRSF PIRSF001141; Acrosin; 1.PRINTS PR00722; CHYMOTRYPSIN.

SMARTSM00020; Tryp_SPc; 1.SMART graphical view of domain structure.

PROSITE

PS50240; TRYPSIN_DOM; 1.PS00134; TRYPSIN_HIS; 1.PS00135; TRYPSIN_SER; 1.PROSITE graphical view of domain structure (profiles).

ProDom [Domain structure / List of seq. sharing at least 1 domain]BLOCKS P10323.Genome annotation databasesEnsembl ENSG00000100312; Homo sapiens. [Contig view]OtherSOURCE ACR; Homo sapiens.ProtoNet P10323.UniRef View cluster of proteins with at least 50% / 90% / 100% identity.

RETRIEVAL OF SEQUENCE FROM NCBI Collection of Nucleotide sequence from NCBI

Go to Expasy page and right click First “GENBANK” link under Cross references and open it in

new window.

Within result page click the “go” icon in the top side box.

Once again click the ID for getting Genbank page and Save it.

Go to display and select Fasta.

Centre for Bioinformatics(5)

Page 6: Bio Informatics

Select the FASTA FORMAT, copy from operator sign, paste it in a new NOTEPAD window and

save it under the file name NUCLEOTIDE FASTA.

Collection of protein sequence from NCBI

Goto genbank webpage and click protein ID link.

Save the page as genpept format.

RETRIEVAL OF SEQUENCE FROM EMBL EMBL : European Molecular Biology Laboratory (http://www.embl-heidelberg.de/) is a widely used site for

information retrieval.. It supports various databases. EMBL is hence an integrated information retrieval

platform allowing the user to seamlessly access the databases.

Collection of nucleotide sequence from EMBL

Go to Expasy page and right click first “EMBL” link under Cross references and open it in new

window.

Within result page copy ID number and paste it inside top side box and click “GO”.

You can get entry page with series of sequence ID having hyperlinks (mostly on the basis of

proximity to your protein name).Look for your protein and follow the hyperlink by clicking on it.

Save it as EMBL format.

Collection of protein sequence EMBL

Go to Embl webpage and click protein ID link.

Save the page as Uniprot format.

RETRIEVAL OF INFORMATION FROM OTHER DATABASES Right click the PDB link under the 3D structural database, open it in a new window and save it.

Similarly open ENZYME, HGNC, GeneCards, GenAtlas, MIM, PFam and ENSEMBL web pages in

new windows by right clicking the ID numbers in the right hand side and save webpages. Copy

HGNC page and paste it in word document.

Right click the fasta format link in the bottom side right hand corner of Expasy page and open it

in new window. Copy the Fasta format and paste it in notepad. Save it as fasta protein.

Centre for Bioinformatics(6)

Page 7: Bio Informatics

TOOLS AND TECHNIQUES

1. HOMOLOGY AND SIMILARITY TOOLS

1.1 PAIR WISE SEQUENCE ANALYSIS

1.1.1 BLAST

Click the link http\\www.ncbi.nlm.nih.gov

Click the BLAST link.

Click the protein-protein query

Submit protein sequence in FASTA format in submission box.

Run Blast.

In result page click “FORMAT” icon.

Save web page as BLAST RESULT.

1.1.2 FASTA

Click the link http\\sbr.ebi.ac.uk

Click the button TOOLS

Click homology and similarity.

Select FASTA

Open FASTA submission form

(Or)

Go to google.com.Type Fasta. Click the link Fasta similarity searching against protein databases .

Paste the protein sequence in FASTA format.

Run FASTA.

In the result page click “Fasta result” Icon

Save the web page as FASTA RESULT.

1.1.3 EMBOSS ALIGN

Type http://www.ebi.ac.uk/emboss/align in the address column

Centre for Bioinformatics(7)

Page 8: Bio Informatics

(OR)

Type “Emboss align” in google.com and select the link Pairwise Alignment algorithms form .

Within submission form select “Needle” and submit fasta protein sequence in First box.

In second box submit one more sequence for comparison and click “RUN”

Within submission form select “Water” and submit fasta protein sequence in First box.

In second box submit one more sequence for comparison and click “RUN”

Save the results.

1 .2 MULTIPLE SEQUENCE ALIGNMENT

1.2.1 CLISTALW

Type www.ebi.ac.uk/ clustalw

(OR)

Type “ClustalW” in google.com. Select the link “ClustalW”.

Submit multiple protein sequences in fasta format and click “Run” icon.

In the result page click “Show Color” and save the result page.

1.2.2 T-COFFEE

Type http://www.ch.embnet.org/software/TCoffee.html in the address box.

(OR)

Type T-Coffee in google.com. Select the link T-COFFEE server.

Submit multiple protein sequences in fasta format and click “Run T-Coffee” icon. Save the

result web page.

2. FUNCTIONAL ANALYSIS TOOLS

2.1 PATTERN SEARCH

2.1.1 SCANPROSITE

Type http://www.expasy.org/tools/scanprosite in the address column.

(OR)

Type T-Coffee in google.com. Select the link ExPASy - ScanProsite .

Submit your protein sequence in fasta format. Click “Start the scan” icon.

Save the result. Click prosite ID link (blue color hyperlink) and documentation page.

2.1.2 INTERPRO

Type http://www.ebi.ac.uk/InterProScan in the address column.

Centre for Bioinformatics(8)

Page 9: Bio Informatics

(OR)

Type “Interproscan” in google.com and select the link InterProScan.

Submit protein sequence in fasta format. Click “Submit Job” icon and Save the result.

2.1.3 BLOCK

Type http://bioinformatics.weizmann.ac.il/blocks/blocks_search.html in the address column.

(OR)

Type “Block Server” in google.com and select the link Block Search .

Submit protein sequence in fasta format. Click “Perform Search” icon and Save the result.

2.1.4 SMART

Type http://smart.embl-heidelberg.de in the address column.

(OR)

Type “Smart” in google.com and select the link SMART : Main page .

Select “Normal mode” and Submit protein sequence in fasta format.

Click “Sequence Smart” icon. Copy the webpage and paste it in word document.

2.2 MOTIF SEARCH

2.2.1 MEME

Type http://meme.sdsc.edu/meme/meme.html in the address column.

(OR)

Type “MEME” in google.com and select the link MEME - Submission form

Enter your E-mail (compulsory) and Submit multiple protein sequences in fasta format. Click

“Start Search”. Go to your mail box and collect your meme result.

2.2.2 MAST

Open one Notepad and type ALPHABET= ACDEFGHIKLMNPQRSTVWY

In Meme result page go through result. Under the heading Motif 1 position-specific scoring matrix

click “ViewPSSM1” icon. Collect motif1 details and paste it in the same notepad. Similarly

under the heading Motif 2 position-specific scoring matrix and Motif 3 position-specific scoring matrix click

“viewPSSM2” and “ViewPSSM3” icons and paste the details in the same notepad. Save the

notepad as “Motif Details”

Type http://meme.sdsc.edu/meme/mast.html in the address column.

(OR)

Type “MEME” in google.com and select the link MEME - Submission form

Centre for Bioinformatics(9)

Page 10: Bio Informatics

Under “menu” click “submit a job” and select mast.

In Mast submission form enter your E-mail (compulsory).

Browse “Motif Details” notepad file through submission form.

Select “Swissprot” database under Mast database column.

Click “Start Search”. Go to your mail box and collect your meme result.

3. STRUCTURAL ANALYSIS TOOLS

3.1 SECONDARY STRUCRUTE PREDICTION

3.1.1 GOR IV

Type http://npsa-pbil.ibcp.fr in the address column

(OR)

Type “npsa” in google.com and select the link npsa-pbil.ibcp.fr/

Under Secondary structure prediction click GOR IV.

Submit raw protein sequence. Click “Submit” icon.

Copy the result and paste it in word document.

3.1.2 SOPMA

Type http://npsa-pbil.ibcp.fr in the address column

(OR)

Type “npsa” in google.com and select the link npsa-pbil.ibcp.fr/

Under Secondary structure prediction click SOPMA.

Submit raw protein sequence. Click “Submit” icon.

Copy the result and paste it in word document.

3.1.3 PHD

Type http://npsa-pbil.ibcp.fr in the address column

(OR)

Type “npsa” in google.com and select the link npsa-pbil.ibcp.fr/

Under Secondary structure prediction click PHD.

Submit raw protein sequence. Click “Submit” icon.

Copy the result and paste it in word document.

3.1.4 NN PREDICT

Type http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html in the address column

(OR)

Centre for Bioinformatics(10)

Page 11: Bio Informatics

Type “nnpredict” in google.com and select the link nnpredict input form .

Submit raw protein sequence. Click “Submit” icon.

Save the result.

3.1.5 JPRED

Type http://www.compbio.dundee.ac.uk/~www-jpred/submit.html in the address column

(OR)

Type “JPRED” in google.com and select the link Jpred submission form

Enter your e-mail(Compulsory).Submit raw protein sequence.

Enable the check box in the 4th column by clicking the check box.

Click “Run secondary structure prediction” icon.

Wait until you get the result page and save the page.

3.1.6 PREDTICT PROTEIN

Type http://www.predictprotein.org/meta in the address column

(OR)

Type “predictprotein server” in google.com and select the link META II - PredictProtein server

Enter your e-mail (Compulsory).Submit raw protein sequence.

Enable JPRED, PHD, PROF and PSIPRED under the heading Protein structure.

Click “Submit /Run prediction” icon.

Go to your mailbox and collect the result by clicking the link.

Save the result page.

3.2 PROTEIN VISUALISATION TOOL

3.2.1 RASMOL

Type http://www.rcsb.org/pdb/Welcome.do in the address column

In PDB page enter protein PDB ID (For example “1BAK”) and click “search”.

In result page click all images under the heading “DISPLAY OPTIONS”

Under the heading “Display Molecule” in left-hand side of the webpage Click Rasmol Viewer

link and download PDB file. Save it in desktop.

Open PDB file through Rasmol window.

Go to display mode and select Ball and Stick, Cartoon, Strands and Space fill one by one.

Copy protein structure under different modes and paste it in word document

Centre for Bioinformatics(11)

Page 12: Bio Informatics

4. SEQUENCE ANALYSIS TOOLS

4.1 ORF PREDICTION TOOL

ORF FINDER (NCBI)

Type http://www.ncbi.nlm.nih.gov/projects/gorf/in the address column

(OR)

Type “NCBI ORF Finder” in google.com and select the link ORF Finder

Submit raw nucleotide sequence and click OrfFind icon.

In result page click first ORF graphical picture.

Copy the result and paste it in word document.

4.2 SPLICE SITE PREDICTION TOOL

NETGENE2

Type http://www.cbs.dtu.dk/services/NetGene2/ in the address column

(OR)

Type “Netgene2” in google.com and select the link NetGene2 Server

Select organism name. Submit raw nucleotide sequence

Click “Send file” icon and save the result.

4.3 GENE PREDICTION TOOL

GENSCAN

Type http://genes.mit.edu/GENSCAN.html in the address column

(OR)

Type “Genscan” in google.com and select the link New GENSCAN Web Server at MIT

Select organism name. Submit raw nucleotide sequence

Click “Run GENSCAN” icon and save the result.

4.4 RESTRICTION MAPPING NEB CUTTER

Type http://tools.neb.com/NEBcutter2/index.php in the address column

(OR)

Type “NEBcutter” in google.com and select the link NEBcutter V2.0

Submit raw nucleotide sequence. Click “Submit” icon and save the result.

Centre for Bioinformatics(12)

Page 13: Bio Informatics

RESULTSA.DATA BASE RETRIEVALExpasy

UniProtKB/Swiss-Prot entry P16152

Entry informationEntry name DHCA_HUMANPrimary accession number P16152Secondary accession numbers NoneIntegrated into Swiss-Prot on April 1, 1990Sequence was last modified on April 1, 1993 (Sequence version 2)

Annotations were last modified on    March 7, 2006 (Entry version 66)Name and origin of the proteinProtein name Carbonyl reductase [NADPH] 1Synonyms EC 1.1.1.184

NADPH-dependent carbonyl reductase 1Prostaglandin-E(2) 9-reductaseEC 1.1.1.189Prostaglandin 9-ketoreductase15-hydroxyprostaglandin dehydrogenase [NADP+]EC 1.1.1.197

Gene name Name: CBR1Synonyms: CBR, CRN

FromHomo sapiens (Human)

 [TaxID: 9606]

 

Taxonomy Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.

References[1] NUCLEOTIDE SEQUENCE [MRNA], AND PARTIAL PROTEIN SEQUENCE.

TISSUE=Placenta;PubMed=3141401 [NCBI, ExPASy, EBI, Israel, Japan]Wermuth B., Bohren K.M., Heinemann G., von Wartburg J.-P., Gabbay K.H.;"Human carbonyl reductase. Nucleotide sequence analysis of a cDNA and amino acid sequence of the encoded protein.";J. Biol. Chem. 263:16185-16188(1988).

[2] NUCLEOTIDE SEQUENCE, AND PARTIAL PROTEIN SEQUENCE.TISSUE=Mammary gland;DOI=10.1016/0167-4781(90)90050-C; PubMed=2182121 [NCBI, ExPASy, EBI, Israel, Japan]Forrest G.L., Akman S., Krutzik S., Paxton R.J., Sparkes R.S., Doroshow J., Felsted R.L., Mohandas T., Bachur N.R.;"Induction of a human carbonyl reductase gene located on chromosome 21.";Biochim. Biophys. Acta 1048:149-155(1990).

Comments FUNCTION : Catalyzes the reduction of a wide variety of carbonyl compounds including the antitumor anthracycline antibiotics. Can

convert prostaglandin E2 to prostaglandin F2-alpha. CATALYTIC ACTIVITY : R-CHOH-R' + NADP+ = R-CO-R' + NADPH. CATALYTIC ACTIVITY : (13E)-(15S)-11-alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (13E)-11-alpha-hydroxy-9,15-

dioxoprost-13-enoate + NADPH. CATALYTIC ACTIVITY : (5Z,13E)-(15S)-11-alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (5Z,13E)-11-alpha-hydroxy-9,15-

dioxoprost-13-enoate + NADPH. SUBUNIT : Monomer. SUBCELLULAR LOCATION : Cytoplasm.

SIMILARITY : Belongs to the short-chain dehydrogenases/reductases (SDR) family. Cross-referencesSequence databasesEMBL J04056; AAA52070.1; -; mRNA. [EMBL / GenBank / DDBJ]

[CoDingSequence]M62420; AAA17881.1; -; Unassigned_DNA.

[EMBL / GenBank / DDBJ] [CoDingSequence]

AB003151; BAA33498.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]

AP000688; BAA89424.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ]

Centre for Bioinformatics(13)

Page 14: Bio Informatics

[CoDingSequence]BT019843; AAV38646.1; -; mRNA. [EMBL / GenBank / DDBJ]

[CoDingSequence]CR541708; CAG46509.1; -; mRNA. [EMBL / GenBank / DDBJ]

[CoDingSequence]AP001724; BAA95508.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ]

[CoDingSequence]BC002511; AAH02511.1; -; mRNA. [EMBL / GenBank / DDBJ]

[CoDingSequence]BC015640; AAH15640.1; -; mRNA. [EMBL / GenBank / DDBJ]

[CoDingSequence]PIR A61271; RDHUCB.3D structure databases

PDB1WMA; X-ray; A=1-276.

[ExPASy / RCSB / EBI]

ModBase P16152.Protein-protein interaction databasesIntAct P16152; -.DIP P16152.2D gel databasesSWISS-2DPAGE

Get region on 2D PAGE.

Organism-specific gene databasesH-InvDB HIX0016099; -.HGNC HGNC:1548; CBR1.GeneCards CBR1.GeneLynx CBR1; Homo sapiens.GenAtlas CBR1.MIM 114830; gene. [NCBI / EBI]HOVERGEN [Family / Alignment / Tree]Gene expression databasesCleanEx HGNC:1548; CBR1.Ontologies

GOGO:0004090; Molecular function: carbonyl reductase (NADPH) activity (traceable author statement).QuickGo view.

Family and domain databases

InterProIPR002347; ADH_short_C2.IPR002198; SDR.Graphical view of domain structure.

PANTHER PTHR19410; ADH_short; 2.

PfamPF00106; adh_short; 1.Pfam graphical view of domain structure.

PRINTSPR00081; GDHRDH.PR00080; SDRFAMILY.

PROSITE PS00061; ADH_SHORT; 1.ProDom [Domain structure / List of seq. sharing at least 1 domain]BLOCKS P16152.Genome annotation databasesEnsembl ENSG00000159228; Homo sapiens. [Contig view]OtherLinkHub P16152; -.SOURCE CBR1; Homo sapiens.ProtoNet P16152.UniRef View cluster of proteins with at least 50% / 90% / 100% identity.Keywords3D-structure; Acetylation; Direct   protein   sequencing ; NADP; Oxidoreductase. FeaturesFeature table viewerKey From       To  Length  Description  FTIdINIT_MET    0     0           CHAIN    1       276   276     Carbonyl reductase [NADPH] 1.  PRO_0000054602NP_BIND    9         33   25     NADP (By similarity).  ACT_SITE    193       193         Proton acceptor (By similarity). BINDING    139       139         Substrate (By similarity).  MOD_RES    1           1         N-acetylserine.  MOD_RES    238       238         N6-1-carboxyethyl lysine.  

Centre for Bioinformatics(14)

Page 15: Bio Informatics

STRAND    6         11   6       STRAND    13         14   2       HELIX    15         27   13       Sequence informationLength: 276 AA [This is the length of the unprocessed precursor]

Molecular weight: 30244 Da [This is the MW of the unprocessed precursor]

CRC64: 78E83065F5677733 [This is a checksum on the sequence]

10 20 30 40 50 60 SSGIHVALVT GGNKGIGLAI VRDLCRLFSG DVVLTARDVT RGQAAVQQLQ AEGLSPRFHQ

70 80 90 100 110 120 LDIDDLQSIR ALRDFLRKEY GGLDVLVNNA GIAFKVADPT PFHIQAEVTM KTNFFGTRDV

130 140 150 160 170 180 CTELLPLIKP QGRVVNVSSI MSVRALKSCS PELQQKFRSE TITEEELVGL MNKFVEDTKK

190 200 210 220 230 240 GVHQKEGWPS SAYGVTKIGV TVLSRIHARK LSEQRKGDKI LLNACCPGWV RTDMAGPKAT

250 260 270 KSPEEGAETP VYLALLPPDA EGPHGQFVSE KRVEQW

P16152 in FASTA format

Centre for Bioinformatics(15)

Page 16: Bio Informatics

NCBI

PubMed Nucleotide Protein Genome Structure PMC Taxonomy OMIM BooksLOCUS HUMCRE 1209 bp mRNA linear PRI 01-NOV-1994DEFINITION Human carbonyl reductase mRNA, complete cds.ACCESSION J04056 X51818VERSION J04056.1 GI:181036KEYWORDS carbonyl reductase.SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 1209) AUTHORS Wermuth,B., Bohren,K.M., Heinemann,G., von Wartburg,J.P. and Gabbay,K.H. TITLE Human carbonyl reductase. Nucleotide sequence analysis of a cDNA and amino acid sequence of the encoded protein JOURNAL J. Biol. Chem. 263 (31), 16185-16188 (1988) PUBMED 3141401COMMENT Original source text: Human placenta, cDNA to mRNA, (library of Clontech). Draft entry and computer-readable sequence for [1] kindly provided by B.Wermuth, 31-AUG-1988.FEATURES Location/Qualifiers source 1..1209 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /map="21" gene 1..1209 /gene="CBR" CDS 94..927 /gene="CBR" /EC_number="1.1.1.184" /codon_start=1 /product="carbonyl reductase" /protein_id="AAA52070.1" /db_xref="GI:181037" /db_xref="GDB:G00-126-610" /translation="MSSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQ AAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTP FHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRS ETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKG DKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQ W" polyA_site 1209 /gene="CBR"ORIGIN 212 bp upstream of PstI site. 1 cagactcgag cagtctctgg aacacgctgc ggggctcccg ggcctgagcc aggtctgttc 61 tccacgcagg tgttccgcgc gccccgttca gccatgtcgt ccggcatcca tgtagcgctg 121 gtgactggag gcaacaaggg catcggcttg gccatcgtgc gcgacctgtg ccggctgttc 181 tcgggggacg tggtgctcac ggcgcgggac gtgacgcggg gccaggcggc cgtacagcag 241 ctgcaggcgg agggcctgag cccgcgcttc caccagctgg acatcgacga tctgcagagc 301 atccgcgccc tgcgcgactt cctgcgcaag gagtacgggg gcctggacgt gctggtcaac 361 aacgcgggca tcgccttcaa ggttgctgat cccacaccct ttcatattca agctgaagtg 421 acgatgaaaa caaatttctt tggtacccga gatgtgtgca cagaattact ccctctaata 481 aaaccccaag ggagagtggt gaacgtatct agcatcatga gcgtcagagc ccttaaaagc 541 tgcagcccag agctgcagca gaagttccgc agtgagacca tcactgagga ggagctggtg 601 gggctcatga acaagtttgt ggaggataca aagaagggag tgcaccagaa ggagggctgg 661 cccagcagcg catacggggt gacgaagatt ggcgtcaccg ttctgtccag gatccacgcc 721 aggaaactga gtgagcagag gaaaggggac aagatcctcc tgaatgcctg ctgcccaggg 781 tgggtgagaa ctgacatggc gggacccaag gccaccaaga gcccagaaga aggtgcagag 841 acccctgtgt acttggccct tttgccccca gatgctgagg gtccccatgg acaatttgtt 901 tcagagaaga gagttgaaca gtggtgagct gggctcacag ctccatccat gggccccatt 961 ttgtaccttg tcctgagttg gtccaaaggg catttacaat gtcataaata tccttatata 1021 agaaaaaaaa tgatctctta tcaattagca ctcactaatg tactactaat tgagcaacct 1081 acgcactcag ttgactacgt aaatctgtca ggtcttttgt gatttcctct gatgcaggag

Centre for Bioinformatics(16)

Page 17: Bio Informatics

1141 aggaaaaatt gtaattgatg aaaataatga atgaaaatca acagatgaat aaatggttct 1201 ttataagtg

EMBLGeneral Information

Primary Accession # J04056

Accession # J04056 X51818

Entry Name EMBL:HSCRE

Molecule Type mRNA

Sequence Length 1209

Entry Division HUM

Sequence Version J04056.1

Creation Date 06-JUL-1989

Modification Date 17-APR-2005

Description

Description Human carbonyl reductase mRNA, complete cds.

Keywords carbonyl reductase.;

Organism Homo sapiens (human)

Organism Classification

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.

References

1. Werm

Centre for Bioinformatics(17)

Page 18: Bio Informatics

uth,B.; Bohren,K.M.; Heinemann,G.; von Wartburg,J.P.; Gabbay,K.H.;

Human car

Centre for Bioinformatics(18)

Page 19: Bio Informatics

bonyl reductase. Nucleotide sequence analysis of a cDNA and amino acid seq

Centre for Bioinformatics(19)

Page 20: Bio Informatics

uence of the encoded protein

J. Biol. Chem. 263(31):16185-16188 (1988)

Pubmed 3141

Centre for Bioinformatics(20)

Page 21: Bio Informatics

401

Position 1-1209

Forrest,G.L.; Akman,S.; Krutzik,S.; Paxton,R.J.; Sparkes,R.S.; Doroshow,J.; Felsted,R.L.; Glover,C.J.; Mohandas,T.; Bachur,N.R.; Induction of a human carbonyl reductase gene located on chromosome 21Biochim. Biophys. Acta 1048(2-3):149-155(1990).

DOI 10.1016/0167-4781(90)90050-C

Pubmed 2182121

Position 1-1209

GDB 1818

Centre for Bioinformatics(21)

Page 22: Bio Informatics

39.

181840.

4571939.

Features

Key Location

source QualifierValue

1..1209 organism Homo sapiens

map 21

mol_type mRNA

db_xref taxon:9606

codon_start 1

gene CBR

product carbonyl reductase

EC_number 1.1.1.184

db_xref GDB:126610

db_xref GOA:P16152

db_xref HGNC:1548

db_xref HSSP:1N5D

db_xref InterPro:IPR002198

db_xref InterPro:IPR002347

db_xref PDB:1WMA

db_xref UniProtKB/Swiss-Prot:P16152

protein_id AAA52070.1

translation

Sequence

Characteristics Leng

Centre for Bioinformatics(22)

Page 23: Bio Informatics

th: 1209 BP, A Count:302, C Count:306, G Count:349, T Count:252, Others Count:

Centre for Bioinformatics(23)

Page 24: Bio Informatics

0

 

PDBTitle

Crystal structure of human CBR1 in complex with Hydroxy-PP

Authors Rauh, D.,  Bateman, R.,  Shokat, K.M.

Primary Citation  

Tanaka, M.,  Bateman, R.,  Rauh, D.,  Vaisberg, E.,  Ramachandani,S.,  Zhang, C.,  Hansen, K.C.,  Burlingame,   Shokat,K.M.,  Adams, C.L. An unbiased cell morphology-based screen for new, biologically active small molecules Plos Biol. v3 pp.128-128 , 2005

History Deposition 2004-07-06 Release  2005-04-26

Experimental Method

Type   X-RAY DIFFRACTION

Centre for Bioinformatics(24)

Page 25: Bio Informatics

Parameters  

Resolution[Å] R-Value R-Free Space Group

1.24 0.129 (all) 0.167

P 21 21 21

Unit Cell  

Length [Å] a 54.45 b 55.35 c 95.93 Angles [°] alpha 90.00  beta 90.00  gamma 90.00 

Molecular Description Asymmetric

Unit

monomer (protein 276 residues)Polymer:  1    Molecule:  Carbonyl reductase [NADPH] 1   Chains:  A;  EC No.: 1.1.1.184

Functional Class Oxidoreductase

Source Polymer:  1    Scientific Name:  Homo sapiens   Common Name:  Human  

system:  Homo sapiens  

Chemical Component 

 

Identifier

Name Formula Drug Similarity

Ligand Structure

Ligand Interaction

SO4   SULFATE ION  O4 S 2-   [ View ] [ View ] [ View ]

PE5  

3,6,9,12,15,18,21,24-OCTAOXAHEXACOSAN-1-OL 

C18 H38 O9   [ View ] [ View ] [ View ]

P33  

3,6,9,12,15,18-HEXAOXAICOSANE-1,20-DIOL 

C14 H30 O8   [ View ] [ View ] [ View ]

NDP  

NADPH DIHYDRO-NICOTINAMIDE-ADENINE-DINUCLEOTIDE PHOSPHATE 

C21 H30 N7 O17 P3 [ View ] [ View ] [ View ]

AB3  

3-(4-AMINO-1-TERT-BUTYL-1H-PYRAZOLO[3,4-D]PYRIMIDIN- 3-YL)PHENOL 

C15 H17 N5 O   [ View ] [ View ] [ View ]

GO TermsPolymer Molecular Function Biological Process Cellular Component Carbonyl reductase [NADPH] 1 (1WMA:A)

oxidoreductas e activity

metabolism none

Centre for Bioinformatics(25)

Page 26: Bio Informatics

NiceZyme View of ENZYME: EC 1.1.1.184Official Name

Carbonyl reductase (NADPH).

Alternative Name(s)

Aldehyde reductase I.

NADPH-dependent carbonyl reductase.

Prostaglandin 9-ketoreductase.

Xenobiotic ketone reductase.

Reaction catalysed

R-CHOH-R' + NADP(+) <=> R-CO-R' + NADPH

Comment(s)

Acts on a wide range of carbonyl compounds, including quinones, aromatic aldehydes, ketoaldehydes, daunorubicin, and prostaglandins E and F, reducing them to the corresponding alcohol.

B-specific with respect to NADPH (cf. EC 1.1.1.2).

Cross-references

PROSITE PDOC00060

BRENDA 1.1.1.184

PUMA2 1.1.1.184

PRIAM enzyme-specific profiles 1.1.1.184

Kyoto University LIGAND chemical database

1.1.1.184

IUBMB Enzyme Nomenclature 1.1.1.184

IntEnz 1.1.1.184

Centre for Bioinformatics(26)

Page 27: Bio Informatics

MEDLINE Find literature relating to 1.1.1.184

MetaCyc 1.1.1.184

UniProtKB/Swiss-Prot

Q21929, CBR2_CAEEL;   P08074, CBR2_MOUSE;   Q29529, CBR2_PIG;  O75828, DHC3_HUMAN;   P16152, DHCA_HUMAN;   P48758, DHCA_MOUSE;  Q28960, DHCA_PIG;   Q5RCU5, DHCA_PONPY;   P47844, DHCA_RABIT;  P47727, DHCA_RAT;   Q8SPU8, DHRS4_BOVIN;   Q9BTZ2, DHRS4_HUMAN;  Q99LB2, DHRS4_MOUSE;   Q8WNV7, DHRS4_PIG;   Q5RCF8, DHRS4_PONPY;  Q9GKX2, DHRS4_RABIT;   Q8VID1, DHRS4_RAT;  

HGNCCore Data Database Links

Approved Symbol + CBR1  Enzyme IDs +

Approved Name + carbonyl reductase 1 1.1.1.184 Enz ID 

HGNC ID + HGNC:1548 Pubmed IDs +

Status + Approved 8432528 PMID 

Chromosome + 21q22.1 OMIM ID (mapped data) +

Previous Symbols + CBR 114830 OMIM 

Previous Names +   Entrez Gene ID (mapped data) +

Aliases +   873 Gene Map Viewer 

  RefSeq (mapped data) +

Gene Symbol Links NM_001757 GenBank  UCSC Browser UCSC Index

Ensembl GeneView  GENATLAS  GeneCards GeneClinics/GeneTests Vega  

UniProt ID (mapped data) +

  P16152 SwissProt  UniProt 

Centre for Bioinformatics(27)

Page 28: Bio Informatics

GENE CARDSChromosome:  21 Entrez Gene cytogenetic band:   21q22.13    Ensembl cytogenetic band:   21q22.12 Nature(405: 311-319) cytogenetic band:   21q22.13

Gene in genomic location: bands according to Ensembl, locations according to (and/or Entrez Gene and/or Ensembl if different)

GeneLoc gene densities for chromosome 21

(about GC identifiers) GC21P036364: GeneLoc Nature:405,311-319Start: 36,364,191 bp from pter 23,019,072 bp from centromereEnd: 36,367,332 bp from pter 23,022,213 bp from centromereSize: 3,141 bases 3,142 basesOrientation: plus strand plus strand

Centre for Bioinformatics(28)

Page 29: Bio Informatics

GENATLAS

FLASH GENE

Symbol CBR1 last update : 30/4/2002

HGNC name carbonyl reductase 1

HGNC id 1548

Location 21q22.2

Synonym symbol(s) CRN

EC.number 1.1.1.184,1.1.1.89,1.1.1.197 DNA RNA EXP/sub-loc PROTEIN PATHOLOGY

DNA

TYPE functioning gene

STRUCTURE 3,1 kb     3 Exon(s)

10 Kb 5' upstream gene genomic sequence study

SUBCELLULAR LOCALIZATION     intracellular,cytoplasm,cytosolic

Centre for Bioinformatics(29)

Page 30: Bio Informatics

FAMILY short chain dehydrogenases/reductases (SDR) familyCATEGORY enzyme

basic FUNCTION carbonyl reductase NADPH-dependent oxidoreductase prostaglandin-E29 reductase,prostaglandin 9-ketoreductase

ENSEMBL

Genomic Location

This gene can be found on Chromosome 21 at location 36,364,191-36,367,332.

The start of this gene is located in Contig AP000688.1.1.171703.

Description Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197). Source: Uniprot/SWISSPROT P16152

Transcript ENST00000290349

Transcript

CBR1 (HGNC Symbol ID) (to view all Ensembl genes linked to the name click here)

This transcript is a member of the human CCDS set: CCDS13641

Transcript information Exons: 3 Transcript length: 1,209 bps Protein length: 277

residues

Centre for Bioinformatics(30)

Page 31: Bio Informatics

Transcript structure

Protein features

PFAMOxidoreductase

Crystal structure of abad/hsd10 with a bound inhibitor

Short chain dehydrogenaseThis family contains a wide variety of dehydrogenases.FAD/NAD(P)-binding Rossmann fold SuperfamilyThis family is a member of the FAD/NAD(P)-binding Rossmann fold Superfamily clan. This clan includes the following Pfam members: Trp_halogenase; TrkA_N; ThiF; Thi4; THF_DHG_CYH_C; Shikimate_DH; Semialdhyde_dh; SE; Saccharop_dh; RmlD_sub_bind; Pyr_redox_2; Pyr_redox; Polysacc_synt_2; PDH; OCD_Mu_crystall; NmrA; NAD_Gly3P_dh_N; NAD_binding_5; NAD_binding_4; NAD_binding_3; NAD_binding_2; Mur_ligase; Mqo; Mannitol_dh; Malic_M; Lycopene_cycl; Ldh_1_N; KR; IlvN; HI0933_like; Gp_dh_N; GMC_oxred_N; GIDA; GFO_IDH_MocA; GDI; G6PD_N; FMO-like; FAD_binding_3; FAD_binding_2; F420_oxidored; Epimerase; ELFV_dehydrog; DXP_reductoisom; DapB_N; DAO; CoA_binding; ApbA; Amino_oxidase; AlaDh_PNT_C; AdoHcyase_NAD; ADH_zinc_N; adh_short; 3HCDH_N; 3Beta_HSD; 2-Hacid_dh_C; UDPG_MGDP_dh_N;

The short-chain dehydrogenases/reductases family (SDR) PUBMED:7742302 is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was Drosophila alcohol dehydrogenase, this family used to be called PUBMED:2707261, PUBMED:1889416, PUBMED:1740120 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least 2 domains PUBMED:6789320, the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little

Centre for Bioinformatics(31)

Page 32: Bio Informatics

sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains PUBMED:6789320.

Input Protein sequence in FASTA format>sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197) - Homo SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

Input multiple Protein sequences in FASTA format>sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197) - Homo

SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

>sp|P48758|DHCA_MOUSE Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) - Mus musculus (Mouse).SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFHQLDIDNPQSIRALRDFLLKEYGGLDVLVNKAGIAFKVNDDTPFHIQAEVTMETNFFGTRDVCKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTKKGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRREDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW

>sp|Q28960|DHCA_PIG Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (20-beta-hydroxysteroid dehydrogenase) (Prostaglandin- E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15- hydroxyprostaglandin dehydr

SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLSPRFHQLDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELTMKTNFMGTRNVCTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETITEEELVGLMNKFVEDTKNGVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNACCPGWVRTDMGGPKAPKSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA

Centre for Bioinformatics(32)

Page 33: Bio Informatics

>sp|P47844|DHCA_RABIT Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) - Oryctolagus cuniculus (Rabbit).PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQLDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDVCTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKKGVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNATKSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW

>sp|P47727|DHCA_RAT Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) - Rattus norvegicus (Rat).SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFHQLDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQDVCKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFIEDAKKGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERREDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW

Input nucleotide sequence in FASTA format. >gi|181036|gb|J04056.1|HUMCRE Human carbonyl reductase mRNA, complete cdsCAGACTCGAGCAGTCTCTGGAACACGCTGCGGGGCTCCCGGGCCTGAGCCAGGTCTGTTCTCCACGCAGGTGTTCCGCGCGCCCCGTTCAGCCATGTCGTCCGGCATCCATGTAGCGCTGGTGACTGGAGGCAACAAGGGCATCGGCTTGGCCATCGTGCGCGACCTGTGCCGGCTGTTCTCGGGGGACGTGGTGCTCACGGCGCGGGACGTGACGCGGGGCCAGGCGGCCGTACAGCAGCTGCAGGCGGAGGGCCTGAGCCCGCGCTTCCACCAGCTGGACATCGACGATCTGCAGAGCATCCGCGCCCTGCGCGACTTCCTGCGCAAGGAGTACGGGGGCCTGGACGTGCTGGTCAACAACGCGGGCATCGCCTTCAAGGTTGCTGATCCCACACCCTTTCATATTCAAGCTGAAGTGACGATGAAAACAAATTTCTTTGGTACCCGAGATGTGTGCACAGAATTACTCCCTCTAATAAAACCCCAAGGGAGAGTGGTGAACGTATCTAGCATCATGAGCGTCAGAGCCCTTAAAAGCTGCAGCCCAGAGCTGCAGCAGAAGTTCCGCAGTGAGACCATCACTGAGGAGGAGCTGGTGGGGCTCATGAACAAGTTTGTGGAGGATACAAAGAAGGGAGTGCACCAGAAGGAGGGCTGGCCCAGCAGCGCATACGGGGTGACGAAGATTGGCGTCACCGTTCTGTCCAGGATCCACGCCAGGAAACTGAGTGAGCAGAGGAAAGGGGACAAGATCCTCCTGAATGCCTGCTGCCCAGGGTGGGTGAGAACTGACATGGCGGGACCCAAGGCCACCAAGAGCCCAGAAGAAGGTGCAGAGACCCCTGTGTACTTGGCCCTTTTGCCCCCAGATGCTGAGGGTCCCCATGGACAATTTGTTTCAGAGAAGAGAGTTGAACAGTGGTGAGCTGGGCTCACAGCTCCATCCATGGGCCCCATTTTGTACCTTGTCCTGAGTTGGTCCAAAGGGCATTTACAATGTCATAAATATCCTTATATAAGAAAAAAAATGATCTCTTATCAATTAGCACTCACTAATGTACTACTAATTGAGCAACCTACGCACTCAGTTGACTACGTAAATCTGTCAGGTCTTTTGTGATTTCCTCTGATGCAGGAGAGGAAAAATTGTAATTGATGAAAATAATGAATGAAAATCAACAGATGAATAAATGGTTCTTTATAAGTG

Centre for Bioinformatics(33)

Page 34: Bio Informatics

B.TOOLS AND TECHNIQUES1. HOMOLOGY AND SIMILARITY TOOLS1.1. PAIRWISE SEQUENCE ANALYSIS TOOLS

BLASTP

Reference:

Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.RID: 1149255627-4454-175385043004.BLASTQ1Database: All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding environmental samples 3,658,925 sequences; 1,257,151,091 total lettersTaxonomy reportsQuery= sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197) - Homo Length=276ALIGNMENT

> gi|15215242|gb|AAH12714.1| Carbonyl reductase 1 [Mus musculus]Length=277

Score = 482 bits (1241), Expect = 6e-135 Identities = 242/276 (87%), Positives = 254/276 (92%), Gaps = 0/276 (0%)

Query 1 SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ 60 SS VALVTG NKGIG AI RDLCR FSGDVVL ARD RGQ AVQ+LQAEGLSPRFHQSbjct 2 SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFHQ 61

Query 61 LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV 120 LDID+ QSIRALRDFL KEYGGLDVLVNNAGIAFKV D TPFHIQAEVTMKTNFFGTRDVSbjct 62 LDIDNPQSIRALRDFLLKEYGGLDVLVNNAGIAFKVNDDTPFHIQAEVTMKTNFFGTRDV 121

Centre for Bioinformatics(34)

Page 35: Bio Informatics

Query 121 CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK 180 C ELLPLIKPQGRVVNVSS++S+RALK+C ELQQKFRSETITEEELVGLMNKFVEDTKKSbjct 122 CKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTKK 181

Query 181 GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT 240 GVH +EGWP+SAYGVTKIGVTVLSRI ARKL+EQR+GDKILLNACCPGWVRTDMAGPKATSbjct 182 GVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRRGDKILLNACCPGWVRTDMAGPKAT 241

Query 241 KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 KSPEEGAETPVYLALLPPDAEGPHGQFV +K+VE WSbjct 242 KSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW 277

> gi|76779821|gb|AAI05894.1| Carbonyl reductase [Rattus norvegicus]Length=277

Score = 478 bits (1231), Expect = 9e-134 Identities = 238/276 (86%), Positives = 254/276 (92%), Gaps = 0/276 (0%)

Query 1 SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ 60 SS VALVTG NKGIG AIVRDLCR F GDVVLTARD +RG AV+QLQ EGLSPRFHQSbjct 2 SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFHQ 61

Query 61 LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV 120 LDID+ QSIRALRDFL +EYGGL+VLVNNAGIAFKV DPTPFHIQAEVTMKTNFFGT+DVSbjct 62 LDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQDV 121

Query 121 CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK 180 C ELLP+IKPQGRVVNVSS +S+RALKSCSPELQQKFRSETITEEELVGLMNKFVED KKSbjct 122 CKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFVEDAKK 181

Query 181 GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT 240 GVH KEGWP+SAYGVTKIGVTVLSRI+ARKL+E+R+ DKILLNACCPGWVRTDMAGPKATSbjct 182 GVHAKEGWPNSAYGVTKIGVTVLSRIYARKLTEERREDKILLNACCPGWVRTDMAGPKAT 241

Query 241 KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 KSPEEGAETPVYLALLPP AEGPHGQFV +K+VE WSbjct 242 KSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW 277

> gi|1352257|sp|P47844|DHCA_RABIT Carbonyl reductase [NADPH] 1 (NADPH-dependent carbonyl reductase 1) gi|458714|gb|AAA77670.1| NADPH-dependent carbonyl reductaseLength=277

Score = 464 bits (1195), Expect = 1e-129 Identities = 230/271 (84%), Positives = 246/271 (90%), Gaps = 0/271 (0%)

Query 6 VALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDD 65 VALVTG NKG+G AI R LCRLFSGDV+LTA+D +GQAAVQQLQAEGLSPRFHQLDI DSbjct 7 VALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQLDITD 66

Query 66 LQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELL 125 LQSIRALRDFLR+ YGGL+VLVNNA IAFK+ D TPFHIQAEVTMKTNF GTRDVCTELLSbjct 67 LQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDVCTELL 126

Query 126 PLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQK 185 PL++P GRVVNVSS+ +RALKSCSPELQQKFRSETITEEELVGLM KFVEDTKKGVHQ Sbjct 127 PLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKKGVHQT 186

Query 186 EGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEE 245 EGWP +AYGVTK+GVTVLSRI AR LSE R GDKIL+NACCPGWVRTDM GP ATKSPEESbjct 187 EGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNATKSPEE 246

Centre for Bioinformatics(35)

Page 36: Bio Informatics

Query 246 GAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 GAETPVYLALLPPDAEGPHGQFV +K+VEQWSbjct 247 GAETPVYLALLPPDAEGPHGQFVMDKKVEQW 277

1.1.2 FASTAFASTA searches a protein or DNA sequence data bank version 3.4t25 Sept 2, 2005Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448Query library @ vs +uniprot librarysearching /ebi/services/idata/v1503/fastadb/uniprot library 1>>>Sequence - 276 aa vs UniProt library1044150180 residues in 3185498 sequences statistics sampled from 60000 to 3176257 sequences Expectation_n fit: rho(ln(x))= 4.9912+/-0.000183; mu= 10.3675+/- 0.010 mean_var=68.0021+/-13.901, 0's: 45 Z-trim: 193 B-trim: 5575 in 2/65 Lambda= 0.155530FASTA (3.47 Mar 2004) function [optimized, BL50 matrix (15:-5)] ktup: 2 join: 36, opt: 24, open/ext: -10/-2, width: 16 Scan time: 17.890The best scores are: >>UNIPROT:Q3SZD7_BOVIN Q3SZD7 Carbonyl reductase 1. (277 aa) initn: 1613 init1: 1613 opt: 1613 Z-score: 1959.4 bits: 370.4 E(): 7.9e-101Smith-Waterman score: 1613; 88.768% identity (95.652% similar) in 276 aa overlap (1-276:2-277)

10 20 30 40 50 Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFH ::. ::::::.:::::..::::::: ::::::::::: .::.::::::::::::: ::UNIPRO MSSSNCVALVTGANKGIGFVIVRDLCRRFSGDVVLTARDEARGRAAVQQLQAEGLSPLFH 10 20 30 40 50 60

60 70 80 90 100 110 Sequen QLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRD :::::: :::::::::::::::::::::::::::::.:: ::::::::::::::::::::UNIPRO QLDIDDRQSIRALRDFLRKEYGGLDVLVNNAGIAFKTADTTPFHIQAEVTMKTNFFGTRD 70 80 90 100 110 120

120 130 140 150 160 170 Sequen VCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTK ::::::::::::::::::::..:: .::.:: ::::::::::::::::::::::::::::UNIPRO VCTELLPLIKPQGRVVNVSSFVSVNSLKKCSRELQQKFRSETITEEELVGLMNKFVEDTK 130 140 150 160 170 180

180 190 200 210 220 230 Sequen KGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKA

Centre for Bioinformatics(36)

Page 37: Bio Informatics

.:::.:::::..:::::::::::::::::::::::: ::::::::::::::::::.::::UNIPRO NGVHRKEGWPDTAYGVTKIGVTVLSRIHARKLSEQRGGDKILLNACCPGWVRTDMGGPKA 190 200 210 220 230 240

240 250 260 270 Sequen TKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW .::::::::::::::::: :::::::.:.::::: ::UNIPRO SKSPEEGAETPVYLALLPSDAEGPHGEFISEKRVVQW 250 260 270>>UNIPROT:Q91X28_MOUSE Q91X28 Carbonyl reductase 1. (277 aa) initn: 1597 init1: 1597 opt: 1597 Z-score: 1940.0 bits: 366.8 E(): 9.5e-100Smith-Waterman score: 1597; 87.681% identity (94.565% similar) in 276 aa overlap (1-276:2-277) 10 20 30 40 50 Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFH ::. ::::::.:::::.::.::::: :::::::.::: :::.:::.:::::::::::UNIPRO MSSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFH 10 20 30 40 50 60 60 70 80 90 100 110 Sequen QLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRD :::::. :::::::::: ::::::::::::::::::: : ::::::::::::::::::::UNIPRO QLDIDNPQSIRALRDFLLKEYGGLDVLVNNAGIAFKVNDDTPFHIQAEVTMKTNFFGTRD 70 80 90 100 110 120

120 130 140 150 160 170 Sequen VCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTK :: :::::::::::::::::..:.::::.: ::::::::::::::::::::::::::::UNIPRO VCKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTK 130 140 150 160 170 180

180 190 200 210 220 230 Sequen KGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKA :::: .::::.:::::::::::::::: ::::.:::.:::::::::::::::::::::::UNIPRO KGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRRGDKILLNACCPGWVRTDMAGPKA 190 200 210 220 230 240

240 250 260 270 Sequen TKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW :::::::::::::::::::::::::::::..:.:: :UNIPRO TKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW 250 260 270

>>UNIPROT:Q3KR58_RAT Q3KR58 Carbonyl reductase. (277 aa) initn: 1582 init1: 1582 opt: 1582 Z-score: 1921.8 bits: 363.4 E(): 9.8e-99Smith-Waterman score: 1582; 86.232% identity (94.565% similar) in 276 aa overlap (1-276:2-277)

10 20 30 40 50 Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFH :: ::::::.:::::.:::::::: : ::::::::: .::. ::.:::.::::::::UNIPRO MSSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFH 10 20 30 40 50 60

60 70 80 90 100 110 Sequen QLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRD :::::. :::::::::: .:::::.::::::::::::.::::::::::::::::::::.:UNIPRO QLDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQD 70 80 90 100 110 120

120 130 140 150 160 170 Sequen VCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTK :: ::::.:::::::::::: .:.::::::::::::::::::::::::::::::::::.:UNIPRO VCKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFVEDAK 130 140 150 160 170 180

180 190 200 210 220 230

Centre for Bioinformatics(37)

Page 38: Bio Informatics

Sequen KGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKA :::: :::::.::::::::::::::::.::::.:.:. ::::::::::::::::::::::UNIPRO KGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLTEERREDKILLNACCPGWVRTDMAGPKA 190 200 210 220 230 240

240 250 260 270 Sequen TKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW ::::::::::::::::::: :::::::::..:.:: :UNIPRO TKSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW 250 260 270 >>UNIPROT:DHCA_PIG Q28960 Carbonyl reductase [NADPH] 1 ( (288 aa) initn: 1547 init1: 1547 opt: 1547 Z-score: 1879.1 bits: 355.6 E(): 2.3e-96Smith-Waterman score: 1547; 84.420% identity (94.565% similar) in 276 aa overlap (1-276:1-276) 10 20 30 40 50 60Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ ::. .::::::.:::::.:::::::: :.::::::::::.::::::.:::::::::::::UNIPRO SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLSPRFHQ 10 20 30 40 50 60

70 80 90 100 110 120Sequen LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV ::: :::::::: :::::::::::::::::.:::.. .:::::::::.::::::.:::.:UNIPRO LDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELTMKTNFMGTRNV 70 80 90 100 110 120

130 140 150 160 170 180Sequen CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK ::::::::::::::::::: .::::. :::::::::.:::::::::::::::::::::.UNIPRO CTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETITEEELVGLMNKFVEDTKN 130 140 150 160 170 180

190 200 210 220 230 240Sequen GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT :::.:::: .:.::::::::.:::::.:::: ::: ::::::::::::::::::.:::: UNIPRO GVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNACCPGWVRTDMGGPKAP 190 200 210 220 230 240

250 260 270 Sequen KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW :::: :::::::::::: ::::::::::..:.: .: UNIPRO KSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA 250 260 270 280

>>UNIPROT:DHCA_RABIT P47844 Carbonyl reductase [NADPH] 1 (276 aa) initn: 1542 init1: 1542 opt: 1542 Z-score: 1873.3 bits: 354.4 E(): 4.9e-96Smith-Waterman score: 1542; 84.871% identity (94.834% similar) in 271 aa overlap (6-276:6-276)

10 20 30 40 50 60Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ ::::::.:::.:.::.: :::::::::.:::.: ..:::::::::::::::::::UNIPRO PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQ 10 20 30 40 50 60

70 80 90 100 110 120Sequen LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV ::: :::::::::::::. ::::.:::::: ::::. : ::::::::::::::: :::::UNIPRO LDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDV 70 80 90 100 110 120

130 140 150 160 170 180Sequen CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK

Centre for Bioinformatics(38)

Page 39: Bio Informatics

:::::::..: ::::::::. .::::::::::::::::::::::::::::.::::::::UNIPRO CTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKK 130 140 150 160 170 180

190 200 210 220 230 240Sequen GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT :::: ::::..::::::.::::::::.::.:::.: :::::.::::::::::::.::.::UNIPRO GVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNAT 190 200 210 220 230 240

250 260 270 Sequen KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW :::::::::::::::::::::::::::: .:.::::UNIPRO KSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW 250 260 270

1.1.3 EMBOSS ALIGN (NEEDLE)######################################### Program: needle# Rundate: Fri Jun 02 15:08:57 2006# Align_format: srspair# Report_file: /ebi/extserv/old-work/needle-20060602-15085486674353.output########################################

#=======================================## Aligned_sequences: 2# 1: DHCA_HUMAN# 2: DCXR_HUMAN# Matrix: EBLOSUM62# Gap_penalty: 10.0# Extend_penalty: 0.5## Length: 317# Identity: 73/317 (23.0%)# Similarity: 108/317 (34.1%)# Gaps: 114/317 (36.0%)# Score: 119.5# ##=======================================

DHCA_HUMAN 1 SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAV 46 .:|..| ||||..||||...|:.|.. :|..|:. |:|.||.:DCXR_HUMAN 1 MELFLAGRRV-LVTGAGKGIGRGTVQALHA--TGARVVA---VSRTQADL 44

DHCA_HUMAN 47 QQLQAE--GLSPRFHQLDIDDLQSI-RALRDFLRKEYGGLDVLVNNAGIA 93 ..|..| |:.|.. :|:.|.::. ||| ...|.:|:|||||.:|DCXR_HUMAN 45 DSLVRECPGIEPVC--VDLGDWEATERAL-----GSVGPVDLLVNNAAVA 87

DHCA_HUMAN 94 F--------KVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVV 135 . |.|....|.:.....::.:....|.:....:| |.:|DCXR_HUMAN 88 LLQPFLEVTKEAFDRSFEVNLRAVIQVSQIVARGLIARGVP-----GAIV 132

DHCA_HUMAN 136 NVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQK 185 ||||..|.||:.: DCXR_HUMAN 133 NVSSQCSQRAVTN------------------------------------- 145

DHCA_HUMAN 186 EGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMA 235

Centre for Bioinformatics(39)

Page 40: Bio Informatics

.|.|..||..:.:|:::.|.:|... ||.:||..|..|.|.|.DCXR_HUMAN 146 ----HSVYCSTKGALDMLTKVMALELGPH----KIRVNAVNPTVVMTSMG 187

DHCA_HUMAN 236 GPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 :||.|....|:|.:... |.|:|...:.|... DCXR_HUMAN 188 --QATWSDPHKAKTMLNRI--------PLGKFAEVEHVVNAILFLLSDRS 227

DHCA_HUMAN 277 276 DCXR_HUMAN 228 GMTTGSTLPVEGGFWAC 244

#---------------------------------------#---------------------------------------

1.1.4 EMBOSS ALIGN (WATER)

######################################### Program: water# Rundate: Fri Jun 02 15:11:06 2006# Align_format: srspair# Report_file: /ebi/extserv/old-work/water-20060602-15110276417491.output########################################

#=======================================## Aligned_sequences: 2# 1: DHCA_HUMAN# 2: DCXR_HUMAN# Matrix: EBLOSUM62# Gap_penalty: 10.0# Extend_penalty: 0.5## Length: 268# Identity: 71/268 (26.5%)# Similarity: 103/268 (38.4%)# Gaps: 75/268 (28.0%)# Score: 130.0# ##=======================================

DHCA_HUMAN 8 LVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAE--GLS 55 ||||..||||...|:.|.. :|..|:. |:|.||.:..|..| |:.DCXR_HUMAN 11 LVTGAGKGIGRGTVQALHA--TGARVVA---VSRTQADLDSLVRECPGIE 55

DHCA_HUMAN 56 PRFHQLDIDDLQSI-RALRDFLRKEYGGLDVLVNNAGIAF--------KV 96 |.. :|:.|.::. ||| ...|.:|:|||||.:|. |.DCXR_HUMAN 56 PVC--VDLGDWEATERAL-----GSVGPVDLLVNNAAVALLQPFLEVTKE 98

DHCA_HUMAN 97 ADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRAL 146 |....|.:.....::.:....|.:....:| |.:|||||..|.||:

Centre for Bioinformatics(40)

Page 41: Bio Informatics

DCXR_HUMAN 99 AFDRSFEVNLRAVIQVSQIVARGLIARGVP-----GAIVNVSSQCSQRAV 143

DHCA_HUMAN 147 KSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVT 196 .: .|.|..|DCXR_HUMAN 144 TN-----------------------------------------HSVYCST 152

DHCA_HUMAN 197 KIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEG 246 |..:.:|:::.|.:|... ||.:||..|..|.|.|. :||.|....DCXR_HUMAN 153 KGALDMLTKVMALELGPH----KIRVNAVNPTVVMTSMG--QATWSDPHK 196

DHCA_HUMAN 247 AETPVYLALLPPDAEGPH 264 |:|.:....|...||..|DCXR_HUMAN 197 AKTMLNRIPLGKFAEVEH 214

#---------------------------------------

#---------------------------------------

1.2 MULTIPLE SEQUENCE ANALYSIS TOOLS1.2.1 CLUSTAL W

CLUSTAL W (1.83) multiple sequence alignment

sp|P48758|DHCA_MOUSE SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFHQ 60sp|P47727|DHCA_RAT SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFHQ 60sp|P16152|DHCA_HUMAN SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ 60sp|Q28960|DHCA_PIG SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLSPRFHQ 60sp|P47844|DHCA_RABIT PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQ 60 .*. ******.***:*:**.* *** * ***:*:*:* :*: **::**:*********

sp|P48758|DHCA_MOUSE LDIDNPQSIRALRDFLLKEYGGLDVLVNKAGIAFKVNDDTPFHIQAEVTMETNFFGTRDV 120sp|P47727|DHCA_RAT LDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQDV 120sp|P16152|DHCA_HUMAN LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV 120sp|Q28960|DHCA_PIG LDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELTMKTNFMGTRNV 120sp|P47844|DHCA_RABIT LDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDV 120 *** : ****** *** : ****:****:* ***:: : ********:**:*** **::*

sp|P48758|DHCA_MOUSE CKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTKK 180sp|P47727|DHCA_RAT CKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFIEDAKK 180sp|P16152|DHCA_HUMAN CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK 180sp|Q28960|DHCA_PIG CTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETITEEELVGLMNKFVEDTKN 180sp|P47844|DHCA_RABIT CTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKK 180 *.****:::* ******** :***:.* ******:*************:**:**:*:

sp|P48758|DHCA_MOUSE GVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRREDKILLNACCPGWVRTDMAGPKAT 240sp|P47727|DHCA_RAT GVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERREDKILLNACCPGWVRTDMAGPKAT 240sp|P16152|DHCA_HUMAN GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT 240

Centre for Bioinformatics(41)

Page 42: Bio Informatics

sp|Q28960|DHCA_PIG GVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNACCPGWVRTDMGGPKAP 240sp|P47844|DHCA_RABIT GVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNAT 240 *** ***..::*****:**:***** **:* *.* ****:************.**:*.

sp|P48758|DHCA_MOUSE KSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW------------ 276sp|P47727|DHCA_RAT KSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW------------ 276sp|P16152|DHCA_HUMAN KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW------------ 276sp|Q28960|DHCA_PIG KSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA 288sp|P47844|DHCA_RABIT KSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW------------ 276 **** ************..********* :*:* *

1.2.2 T-COFFEESCORE=81* BAD AVG GOOD*sp|P16152|DHCA_   :  82sp|P48758|DHCA_   :  81sp|Q28960|DHCA_   :  81sp|P47844|DHCA_   :  80sp|P47727|DHCA_   :  81

sp|P16152|DHCA_   SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLsp|P48758|DHCA_   SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLsp|Q28960|DHCA_   SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLsp|P47844|DHCA_   PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLsp|P47727|DHCA_   SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGL

Cons              .*.  ******.***:*:**.* *** * ***:*:*:*  :*: **::**:***

sp|P16152|DHCA_   SPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVsp|P48758|DHCA_   SPRFHQLDIDNPQSIRALRDFLLKEYGGLDVLVNKAGIAFKVNDDTPFHIQAEVsp|Q28960|DHCA_   SPRFHQLDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELsp|P47844|DHCA_   SPRFHQLDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVsp|P47727|DHCA_   SPRFHQLDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEV

Cons              ********* : ****** *** : ****:****:* ***:: : ********:

sp|P16152|DHCA_   TMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETIsp|P48758|DHCA_   TMETNFFGTRDVCKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETIsp|Q28960|DHCA_   TMKTNFMGTRNVCTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETIsp|P47844|DHCA_   TMKTNFDGTRDVCTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETI

Centre for Bioinformatics(42)

Page 43: Bio Informatics

sp|P47727|DHCA_   TMKTNFFGTQDVCKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETI

Cons              **:*** **::**.****:::* ********   :***:.*  ******:****

sp|P16152|DHCA_   TEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKsp|P48758|DHCA_   TEEELVGLMNKFVEDTKKGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRRsp|Q28960|DHCA_   TEEELVGLMNKFVEDTKNGVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAsp|P47844|DHCA_   TEEELVGLMKKFVEDTKKGVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGsp|P47727|DHCA_   TEEELVGLMNKFIEDAKKGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERR

Cons              *********:**:**:*:***  ***..::*****:**:***** **:* *.* 

sp|P16152|DHCA_   GDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEsp|P48758|DHCA_   EDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVQDsp|Q28960|DHCA_   GDKILLNACCPGWVRTDMGGPKAPKSPEVGAETPVYLALLPSDAEGPHGQFVTDsp|P47844|DHCA_   GDKILVNACCPGWVRTDMGGPNATKSPEEGAETPVYLALLPPDAEGPHGQFVMDsp|P47727|DHCA_   EDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPGAEGPHGQFVQD

Cons               ****:************.**:*.**** ************..********* :

sp|P16152|DHCA_   KRVEQW------------sp|P48758|DHCA_   KKVEPW------------sp|Q28960|DHCA_   KKVVEWGVPPESYPWVNAsp|P47844|DHCA_   KKVEQW------------sp|P47727|DHCA_   KKVEPW------------

Cons              *:*  *      

2. FUNCTIONAL ANALYSIS TOOLS2.1. PATTERN SEARCH

2.1.1 SCAN PROSITE  

    

ruler:

hits by patterns: [1 hit (by 1 pattern) on 1 sequence]

Hits by PS00061   ADH_SHORT   Short-chain dehydrogenases/reductases family signature :sp-P16152-DHC~     (276 aa)

180 - 208:   KgvhqkegwpSsaYGVTKIGVtVLSrIHA

Short-chain dehydrogenases/reductases family signature

Description:

Centre for Bioinformatics(43)

Page 44: Bio Informatics

The short-chain dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was Drosophila alcohol dehydrogenase, this family used to be called [2,3,4] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently known to belong to this family are listed below.

Alcohol dehydrogenase (EC 1.1.1.1) from insects such as Drosophila. Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene budC). D-β-hydroxybutyrate dehydrogenase (BDH) (EC 1.1.1.30) from mammals. Acetoacetyl-CoA reductase (EC 1.1.1.36) from various bacterial species (gene phbB or phaB). Glucose 1-dehydrogenase (EC 1.1.1.47) from Bacillus. 3-β-hydroxysteroid dehydrogenase (EC 1.1.1.51) from Comomonas testosteroni. 20-β-hydroxysteroid dehydrogenase (EC 1.1.1.53) from Streptomyces hydrogenans. Ribitol 2-dehydrogenase (EC 1.1.1.56) (RDH) from Klebsiella aerogenes. Estradiol 17-β-dehydrogenase (EC 1.1.1.62) from human. Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene gno). 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100) from Escherichia coli (gene fabG) and from

plants. Retinol dehydrogenase (EC 1.1.1.105) from mammals. 2-deoxy-d-gluconate 3-dehydrogenase (EC 1.1.1.125) from Escherichia coli and Erwinia chrysanthemi

(gene kduD). Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140) from Escherichia coli (gene gutD) and from

Klebsiella pneumoniae (gene sorD). 15-hydroxyprostaglandin dehydrogenase (NAD+) (EC 1.1.1.141) from human. Corticosteroid 11-β-dehydrogenase (EC 1.1.1.146) (11-DH) from mammals. 7-α-hydroxysteroid dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA), Eubacterium strain

VPI 12708 (gene baiA) and from Clostridium sordellii. NADPH-dependent carbonyl reductase (EC 1.1.1.184) from mammals. Tropinone reductase-I (EC 1.1.1.206) and -II (EC 1.1.1.236) from plants. N-acylmannosamine 1-dehydrogenase (EC 1.1.1.233) from Flavobacterium strain 141-8. D-arabinitol 2-dehydrogenase (ribulose forming) (EC 1.1.1.250) from fungi. Tetrahydroxynaphthalene reductase (EC 1.1.1.252) from Magnaporthe grisea. Pteridine reductase 1 (EC 1.5.1.33) (gene PTR1) from Leishmania. 2,5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase (EC 1.1.-.-) from Pseudomonas paucimobilis. Cis-1,2-dihydroxy-3,4-cyclohexadiene-1-carboxylate dehydrogenase (EC 1.3.1. -) from Acinetobacter

calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). Biphenyl-2,3-dihydro-2,3-diol dehydrogenase (EC 1.3.1.-) (gene bphB) from various Pseudomonaceae. Cis-toluene dihydrodiol dehydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). Cis-benzene glycol dehydrogenase (EC 1.3.1.19) from Pseudomonas putida (gene bnzE). 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28) from Escherichia coli (gene entA) and

Bacillus subtilis (gene dhbA). Dihydropteridine reductase (EC 1.5.1.34) (HDHPR) from mammals. Lignin degradation enzyme ligD from Pseudomonas paucimobilis. Agropine synthesis reductase from Agrobacterium plasmids (gene mas1). Versicolorin reductase from Aspergillus parasiticus (gene VER1). Putative keto-acyl reductases from Streptomyces polyketide biosynthesis operons. A trifunctional hydratase-dehydrogenase-epimerase from the peroxisomal β-oxidation system of Candida

tropicalis. This protein contains two tandemly repeated 'short-chain dehydrogenase-type' domain in its N-terminal extremity.

Nodulation protein nodG from species of Azospirillum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acyl chain.

Nitrogen fixation protein fixR from Bradyrhizobium japonicum. Bacillus subtilis protein dltE which is involved in the biosynthesis of D- alanyl-lipoteichoic acid. Human follicular variant translocation protein 1 (FVT1). Mouse adipocyte protein p27. Mouse protein Ke 6. Maize sex determination protein TASSELSEED 2. Sarcophaga peregrina 25 Kd development specific protein.

Centre for Bioinformatics(44)

Page 45: Bio Informatics

Drosophila fat body protein P6. A Listeria monocytogenes hypothetical protein encoded in the internalins gene region. Escherichia coli hypothetical protein yciK. Escherichia coli hypothetical protein ydfG. Escherichia coli hypothetical protein yjgI. Escherichia coli hypothetical protein yjgU. Escherichia coli hypothetical protein yohF. Bacillus subtilis hypothetical protein yoxD. Bacillus subtilis hypothetical protein ywfD. Bacillus subtilis hypothetical protein ywfH. Yeast hypothetical protein YIL124w. Yeast hypothetical protein YIR035c. Yeast hypothetical protein YIR036c. Yeast hypothetical protein YKL055c. Fission yeast hypothetical protein SpAC23D3.11.

2.1.2 INTERPRO

Centre for Bioinformatics(45)

Page 46: Bio Informatics

2.1.3 BLOCK

Hits

Query=sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependeSize=276 Amino AcidsBlocks Searched=27214Alignments Done= 8220276Cutoff combined expected value for hits= 1Cutoff block expected value for repeats/other= 1============================================================================== CombinedFamily Strand Blocks E-valueIPB002347 Glucose/ribitol dehydrogenase famil 1 6 of 6 3e-36IPB002198 Short-chain dehydrogenase/reductase 1 2 of 2 2.3e-09IPB004358 Bacterial sensor protein C-terminal 1 1 of 4 0.01IPB001294 Phytochrome 1 1 of 14 0.14IPB011489 EMI 1 1 of 1 0.45==============================================================================>IPB002347 6/6 blocks Combined E-value= 3e-36: Glucose/ribitol dehydrogenase family signatureBlock Frame Location (aa) Block E-valueIPB002347A 0 6-23 2e-07IPB002347B 0 81-92 1.5e-06IPB002347C 0 126-142 0.0028IPB002347D 0 193-212 0.034IPB002347E 0 218-235 0.00017

Centre for Bioinformatics(46)

Page 47: Bio Informatics

IPB002347F 0 236-256 0.074Other reported alignments: |--- 637 amino acids---| IPB002347 A..............B.............C...D..........E...............Fsp|P16152|DHCA_HUM A::B:C::DEF

IPB002347A <->A (-3,1042):5 DHCA_HUMAN|P16152 6 VALVTGGNKGIGLAIVRD ||||||||||||||||||sp|P16152|DHCA_HUM 6 VALVTGGNKGIGLAIVRDIPB002347B A<->B (5,365):57 DHCA_HUMAN|P16152 81 GGLDVLVNNAGI ||||||||||||sp|P16152|DHCA_HUM 81 GGLDVLVNNAGIIPB002347C B<->C (-1,327):33 DHCA_HUMAN|P16152 126 PLIKPQGRVVNVSSIMS |||||||||||||||||sp|P16152|DHCA_HUM 126 PLIKPQGRVVNVSSIMSIPB002347D C<->D (1,90):50 DHCA_HUMAN|P16152 193 YGVTKIGVTVLSRIHARKLS ||||||||||||||||||||sp|P16152|DHCA_HUM 193 YGVTKIGVTVLSRIHARKLSIPB002347E D<->E (-2,260):5 DHCA_HUMAN|P16152 218 DKILLNACCPGWVRTDMA ||||||||||||||||||sp|P16152|DHCA_HUM 218 DKILLNACCPGWVRTDMA

IPB002347F E<->F (-2,381):0 DHCA_HUMAN|P16152 236 GPKATKSPEEGAETPVYLALL |||||||||||||||||||||sp|P16152|DHCA_HUM 236 GPKATKSPEEGAETPVYLALL------------------------------------------------------------------------------>IPB002198 2/2 blocks Combined E-value= 2.3e-09: Short-chain dehydrogenase/reductase SDRBlock Frame Location (aa) Block E-vlueIPB002198A 0 83-92 4.2e-05IPB002198B 0 173-221 0.041Other reported alignments: |--- 2598 amino acids---| IPB002198 A...........................................................Bsp|P16152|DHCA_HUM A:BIPB002198A <->A (-3,7409):82DHCA_HUMAN|P16152 83 LDVLVNNAGI |||||||||| sp|P16152|DHCA_HUM 83 LDVLVNNAGI

IPB002198B A<->B (-2,6175):80 DHCA_HUMAN|P16152 173 KFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKIL |||||||||||||||||||||||||||||||||||||||||||||||||sp|P16152|DHCA_HUM 173 KFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKIL------------------------------------------------------------------------------>IPB004358 1/4 blocks Combined E-value= 0.01: Bacterial sensor protein C-terminal signatureBlock Frame Location (aa) Block E-valueIPB004358C 0 12-30 0.011Other reported alignments: |--- 670 amino acids---| IPB004358 A......................B...............C.....................Dsp|P16152|DHCA_HUM CIPB004358C <->C (17,3694):11 Q9A7L7 381 GGTGLGLAISRDLARLMGG | | |||| ||| || |sp|P16152|DHCA_HUM 12 GNKGIGLAIVRDLCRLFSG------------------------------------------------------------------------------>IPB001294 1/14 blocks Combined E-value= 0.14: PhytochromeBlock Frame Location (aa) Block E-valueIPB001294N 0 14-37 0.15Other reported alignments: |--- 475 amino acids---| IPB001294 A:.BB:.CC:.DD.EE:FF.GG::HHH:III::.JJJ:::KKK:::LLMM::......Nsp|P16152|DHCA_HUM :NIPB001294N <->N (898,1187):13 Q8GV69 1075 EGLGLNICRKLVRLMNGDVQYVRE | || | | | || ||| sp|P16152|DHCA_HUM 14 kGIGLAIvRdLcRLfSGDVvLTaR------------------------------------------------------------------------------

Centre for Bioinformatics(47)

Page 48: Bio Informatics

>IPB011489 1/1 blocks Combined E-value= 0.45: EMIBlock Frame Location (aa) Block E-valueIPB011489 0 215-231 0.46Other reported alignments:IPB011489 <->EGFL7_HUMAN|Q9UHF1 78 GLAPARPRYACCPGWKR |||||| |sp|P16152|DHCA_HUM 215 RKgDKiLLNACCPGWvR

5 possible hits reported

2.1.4 SMART

Domains within Homo sapiens protein DHCA_HUMAN (P16152)

Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase)

Centre for Bioinformatics(48)

Page 49: Bio Informatics

Mouse over domain / undefined region for more info; click on it to go to detailed annotation; right-click to save whole protein as PNG imageTransmembrane segments as predicted by the TMHMM2 program ( ), coiled coil regions determined by the Coils2 program ( ), segments of low compositional complexity determined by the SEG program ( ). Signal peptides determined by the SignalP program ( ).

2.2 MOTIF ANALYSIS

2.2.1 MEME

MOTIF 1

DKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHG

Centre for Bioinformatics(49)

Page 50: Bio Informatics

MOTIF 2

CSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHAKEGWPDSAYGVTKI

MOTIF 3

PRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQ

2.2.2 MAST

gi|118519|sp|P16152|DHCA_HUMAN

Carbonyl reductase [NADPH] 1 (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase)

Centre for Bioinformatics(50)

E S D ?

Page 51: Bio Informatics

(Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+])LENGTH = 277 COMBINED P-VALUE = 1.31e-61 E-VALUE = 2.5e-56DIAGRAM: 218-[1]-9 [1] 5.7e-64 DKILLNA +++++++151 SPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++++++++++++++++++++++++++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

gi|1352256|sp|P48758|DHCA_MOUSE

[1] 5.7e-64 DKILLNA +++++++151 RLELQQKFRSETITEEELVGLMNKFVEDTKKGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRREDKILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++++++++++++++++++++++++++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW

gi|75061940|sp|Q5RCU5|DHCA_PONPY

[1] 2.6e-62 DKILLNA + +++++151 SPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDRILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++++++++++++++++++++++++++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

gi|1352258|sp|P47727|DHCA_RAT

[1] 5.3e-62 DKILLNA +++++++151 SPELQQKFRSETITEEELVGLMNKFIEDAKKGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERREDKILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF ++++++++++++++++++++++++++++++++++ ++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW

gi|1352257|sp|P47844|DHCA_RABIT

Carbonyl reductase [NADPH] 1 (NADPH-dependent carbonyl reductase 1)LENGTH = 277 COMBINED P-VALUE = 1.55e-58 E-VALUE = 3e-53DIAGRAM: 218-[1]-9 [1] 6.8e-61 DKILLNA

Centre for Bioinformatics(51)

E S D ?

E S D ?

E S D ?

E S D ?

Page 52: Bio Informatics

+++++++151 SPELQQKFRSETITEEELVGLMKKFVEDTKKGVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNA

CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++ +++++++++++++++++++++++++++++226 CCPGWVRTDMGGPNATKSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW

gi|54035740|sp|Q28960|DHCA_PIG

[1] 6.9e-59 DKILLNA +++++++151 SPELQQKFKSETITEEELVGLMNKFVEDTKNGVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNA

CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF ++++++++++++++++++++ ++++++++++++++++++++++226 CCPGWVRTDMGGPKAPKSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA

gi|6014959|sp|O75828|DHC3_HUMAN

Carbonyl reductase [NADPH] 3 (NADPH-dependent carbonyl reductase 3)LENGTH = 277 COMBINED P-VALUE = 4.79e-32 E-VALUE = 9.2e-27DIAGRAM: 218-[1]-9 [1] 2.1e-34 DKILLNA + +++++151 SEDLQERFHSETLTEGDLVDLMKKFVEDTKNEVHEREGWPNSPYGVSKLGVTVLSRILARRLDEKRKADRILVNA

CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF ++++ + +++ + + +++++++++++++++++ + ++226 CCPGPVKTDMDGKDSIRTVEEGAETPVYLALLPPDATEPQGQLVHDKVVQNW

3. STRUCTURAL ANALYSIS TOOLS

3.1 SECONDARY STRUCTURE

3.1.1 GOR4

10 20 30 40 50 60 70 | | | | | | |SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRccceeeeeeccccccceeeeeeeeeeccccceeeeecchhhhhhhhhhhhhhcccccccccchhhhhhhhALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIhhhhhhhhhcccceeeeccccceeeccccccchhhhhhhhhceeccccccccccccccccccceeeecceMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKeeeeeeccccchhhhhhhhcchhhhhhhhhhhhhhhhcceeeeeeccccceeeeeeeeeeeehhhhhhhhLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQWhhhhhhcceeeeecccccceeecccccccccccccccccceeeeeccccccccccceeeeeeeeecSequence length : 276GOR4 : Alpha helix (Hh) : 78 is 28.26%

Centre for Bioinformatics(52)

E S D ?

E S D ?

Page 53: Bio Informatics

310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 81 is 29.35% Beta turn (Tt) : 0 is 0.00% Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 117 is 42.39% Ambigous states (?) : 0 is 0.00% Other states : 0 is 0.00%

3.1.2 SOPMA

10 20 30 40 50 60 70 | | | | | | |SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRcttceeeeeecccttchhhhhhhhhhtttceeeeeccchhhhhhhhhhhhhttcccceeeecccchhhhhALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIhhhhhhhhtttcceeeeetttceeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhccttceeeeeechMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKhhhhhhhtcchhhhhhhhhhccchhhhhhhhhhhhhhhhttccccccccccceehhhheeeehhhhhhhhLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQWhccttttceeeeeeccttceeecccccccccccccccccceeeeeccttccccccceeechhhhhhSequence length : 276SOPMA : Alpha helix (Hh) : 114 is 41.30% 310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 52 is 18.84% Beta turn (Tt) : 28 is 10.14%

Centre for Bioinformatics(53)

Page 54: Bio Informatics

Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 82 is 29.71% Ambigous states (?) : 0 is 0.00% Other states : 0 is 0.00%

Parameters : Window width : 17 Similarity threshold : 8 Number of states : 4

3.1.3 PHD

10 20 30 40 50 60 70 | | | | | | |SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRCCCCeEEEEcCChHHHHHHHHHHHHHHCCCeEEEEEcChhHHHHHHHHHHHcCCCceEEEecCCcHHHHHALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIHHHHHHHHHcCCCEEEEEcCCceecCCCCCCceEEEEEEEEeeccchHHHHHHHhHhhCCCCcEEEEEEeMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKhhhhccccCChhHHHhcCCCCCcHHHHHHHHHHHHhhHhccCCccCCCCCCCceeeeeeeeeeehHHHHHLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQWHhhCCCCCeEEEcCCCCCceccCCCCCCCCCCCCCCCCcEEEEEEcCCCCCCCCCCeecCCcccCCSequence length : 276PHD : Alpha helix (Hh) : 89 is 32.25% 310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 65 is 23.55% Beta turn (Tt) : 0 is 0.00%

Centre for Bioinformatics(54)

Page 55: Bio Informatics

Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 122 is 44.20% Ambigous states (?) : 0 is 0.00% Other states : 0 is 0.00%

Residues with a scale reliability index of prediction of 5 and over (uppercase letters) are predicted at better than 82%.

3.1.4 NN PREDICT

Tertiary structure class: none

Sequence:SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

Secondary structure prediction (H = helix, E = strand, - = no prediction):----EEEEE-------EEEEHHHHHHH----EEEE---HHH-HHHHHHHH---------------HHHHHHHHHHHHHH----EEEE----HEEE----------H-HH-E----------HHH---------EEEE---HEHEE------HH----------HHHHHHHHHHHHHHHH---------------EEEEEEEEEHHHHH----H-----HHEE---------------------------HHEE-------------E------

Centre for Bioinformatics(55)

Page 56: Bio Informatics

3.1.5 JPRED

OrigSeq : SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW : OrigSeq

jalign : -----EEEEE----HHHHHHHHHHHH----EEEEE---HHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH----EEEEE----------HH--HHHHHHHHHHHH--HHHHHHHHHHHH-----EEEEEEE----------HHHHHHHHHHHHHHHHHHHH-------------------HHHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE-------------------H-------------HHHHHHHHHH-------- : jalign

jfreq : -----EEEEE------HHHHHHHHHHH---EEEEEH-HHHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH---EEEEEH--------------HHHHHHHHHHHHHHHHHHHHHHHHHHH----EEEEEEE---------HHHHHHHHHHHHHHHHHHHHH----EEEEE-----------HHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE----------------HHHHHH--H-------HHHHHHHHHHH-------- : jfreq

jhmm : ----EEEEEE----HHHHHHHHHHHHH---EEEE----HHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHH------EEEE--------------HHHHHHHHHHH--HHHHHHHHHHHHH-----

Centre for Bioinformatics(56)

Page 57: Bio Informatics

EEEEEEEEEE---------HHHHHHHHHHHHHHHHHH----------------------HHHHHHHHHHHHHHHHHHHHH------EEEEEE-----------------HHHHHHHHHHHH---HHHHHHHHHHH-------- : jhmm

jnet : -----EEEEE----HHHHHHHHHHHHH---EEEEE--HHHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH----EEEEE--------------HHHHHHHHHHHHHHHHHHHHHHHHHHH---EEEEEEEEHHHH-----HHHHHHHHHHHHHHHHHHHHH-----EEE------------HHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE----------------HHHHHHHHHHHH----HHHHHHHHHEE-------- : jnet

jpssm : -----EEEEE----HHHHHHHHHHHHH---EEEE----HHHHHHHHHH----------H-----------------------------------------------HHH-HHHHHHHHHHHHHHHHHH----EEEEE---HHHHHH--------------HHHHHHHHHHH----------------------HHHHHHHHHHHHHHHHHHH-------EEEEEE-----------------HHHHHHHHHHHH---HHHHH---EEE-------- : jpssm

jpred : -----EEEEE----HHHHHHHHHHHHH---EEEEE---HHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH----EEEEE--------------HHHHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEEEHHH-------HHHHHHHHHHHHHHHHHHHH--------------------HHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE-----------------HHHHHHHHHHH----HHHHHHHHHHH-------- : jpred

Jnet_25 : B---BBBBBBBBB-BBBBBBBBBBB--BBBBBBBBBB----B--BB--B----B-BBBBBBBBB---BB--BB--BB--BB-BBBBBBBBBBBBBBBBB-BBB-BBBBBBBBBBBBBBBBBBBBB-BB---BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB--BB-BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB-------BBBBBBBBBBBBBBBBB---B--BB--BB-BBBBBBBBB-BBB-BBBBBBB---BB-B : Jnet_25

Jnet_5 : ------BBBBBB---BB--BB--BB---B--BBBB------B--B---B------B-BB-B-B--------B---B--------BBBBBB---------------B--BB-BBB-BBBBBB-BBB--B-----BBBBBBBBBBB-----BBBBBB--BB-BBBBBBB--B------------------------BBBBBB-BBB-BBB--B--------BBBBBBBBB-B-B-------------BB--BB--B-----B--BBB-BB-------- : Jnet_5

Jnet_0 : ------BBBB------B--BB--BB------B---------B--B----------------------------------------------------------------------B---BB--B----------B------------------------------------------------------------B--B----B--BB-------------BBBB---------------------------------------B----------- : Jnet_0

Jnet Rel : 499908999728975799999999886799499755802179999875984776426677112237788888999998334440454022223455667666665811192225589999999998860697089834010111250330457789888912467787655234434222123455677236822478899999999999727189993499983588646656646981349999887886162843555277740347866388 : Jnet RelNotesKey:Colour code for alignment:Blue - Complete identity at a positionShades of red - The more red a position is, the higher the level of conservation of chemical properties of the amino acidsjalign - Jnet alignment predictionjfreq - Jnet PSIBLAST frequency profile predictionjhmm - Jnet hmm profile predictionjnet - Jnet predictionjpssm - Jnet PSIBLAST pssm profile predictionjpred - Consensus prediction over all methods

MCoil - MultiCoil prediction (and dimer and trimer predictions)Lupas - Lupas Coil prediction (window size of 14, 21 and 28)

Note on coiled coil predictions - = less than 50% probability

Centre for Bioinformatics(57)

Page 58: Bio Informatics

c = between 50% and 90% probability C = greater than 90% probability

Jnet_25 - Jnet prediction of burial, less than 25% solvent accesibilityJnet_5 - Jnet prediction of burial, less than 5% exposureJnet_0 - Jnet prediction of burial, 0% exposureJnet Rel - Jnet prediction of prediction accuracy, ranges from 0 to 9, bigger is better

3.1.6 PREDICT PROTEIN

PROF results (normal)PROF results (normal)

AA SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

PROF_sec EEEEEE HHHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH EEEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHHH HHHHHHHHHHHHHHH EEEEEE

Centre for Bioinformatics(58)

Page 59: Bio Informatics

HHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHHH EEEEE HHHH HHHHHHHHHH

SUB_sec LLL..EEEE..LL.HHHHHHHHHHHH.LLL.EEEE.LL..HHHHHHHHHH..LLL.EEEE..LLLHHHHHHHHHHHHHH.LL..EEE........LLL..LL......HH......HHHHHHHHHHH...LL.EEEE.......LLL.....HH.HHHHHHHHHHHHHHH...LL.....L.LL..LL........HHHHHHHHHHHHHHHHH...LLL.EEE.........LLLLLLLL........L....LLL..H............LLLLL

O_3_acc bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

P_3_acc e e bbbbbbb bbb bbb bbee e bbbbb e eebeebbe b eeebeb bb bebeeee be bbe b ee eebbbbbbbbbb e e ee e b bb bbb bbb bb bbe b eeb bbbbbbbbb e ee bb bbb bbb bbb bb ebbee ee ee e eeee e eb bb bbb bbb bbb bbee eee bbbbbbbbbbb beb eee eeeeee e b ee eebbebbbbbb eeeeeee

Rel_acc 705136999632021472289339852201397943332436245454835143433312120251616349354945411230999633111210001213132232422022235234934934361331399475411111011011133632286427532173314532212123000135110213112421832265306933253222312307674312012020101102332521022000102223221226304321410335

SUB_acc e.e..bbbbb.....bb..bb..bbe.....bbbb....e.b.ebbeib.e.e.e.........e.e.b.ib.eibiee.....bbbb....................b.......b..bb.ib.e.b.....bbbbbb..............b...bbi.bb...b...be.............e.........b..b...bb..bb...b.........bbbb..................e...................b..b...e....e

Sequence Details 1WMA

Chain A, representative of identical chains Chain A

Description Carbonyl reductase [NADPH] 1

Type polypeptide(L)

Polymer Id 1

Number of residues 276

Domains 1WMAA0: dp domain 1WMAA0

Centre for Bioinformatics(59)

Page 60: Bio Informatics

Sequence and Secondary Structure  

Key: = extended strand, = turn, = disulfide bond

= alpha helix, = 310 helix, = pi helix, Greyed out residues have no structural information

3.2 PROTEIN VISUALISATION TOOL

3.2.1 RASMOL

Centre for Bioinformatics(60)

Page 61: Bio Informatics

Ball and Stick

Cartoon

Centre for Bioinformatics(61)

Page 62: Bio Informatics

Strands

Space fill

Centre for Bioinformatics(62)

Page 63: Bio Informatics

4. SEQUENCE ANALYSIS4.1 ORF PREDICTION TOOLORF FINDER

94 atgtcgtccggcatccatgtagcgctggtgactggaggcaacaag M S S G I H V A L V T G G N K 139 ggcatcggcttggccatcgtgcgcgacctgtgccggctgttctcg G I G L A I V R D L C R L F S 184 ggggacgtggtgctcacggcgcgggacgtgacgcggggccaggcg G D V V L T A R D V T R G Q A 229 gccgtacagcagctgcaggcggagggcctgagcccgcgcttccac A V Q Q L Q A E G L S P R F H 274 cagctggacatcgacgatctgcagagcatccgcgccctgcgcgac Q L D I D D L Q S I R A L R D 319 ttcctgcgcaaggagtacgggggcctggacgtgctggtcaacaac F L R K E Y G G L D V L V N N 364 gcgggcatcgccttcaaggttgctgatcccacaccctttcatatt A G I A F K V A D P T P F H I 409 caagctgaagtgacgatgaaaacaaatttctttggtacccgagat Q A E V T M K T N F F G T R D 454 gtgtgcacagaattactccctctaataaaaccccaagggagagtg V C T E L L P L I K P Q G R V 499 gtgaacgtatctagcatcatgagcgtcagagcccttaaaagctgc V N V S S I M S V R A L K S C 544 agcccagagctgcagcagaagttccgcagtgagaccatcactgag S P E L Q Q K F R S E T I T E 589 gaggagctggtggggctcatgaacaagtttgtggaggatacaaag E E L V G L M N K F V E D T K 634 aagggagtgcaccagaaggagggctggcccagcagcgcatacggg K G V H Q K E G W P S S A Y G 679 gtgacgaagattggcgtcaccgttctgtccaggatccacgccagg V T K I G V T V L S R I H A R 724 aaactgagtgagcagaggaaaggggacaagatcctcctgaatgcc K L S E Q R K G D K I L L N A 769 tgctgcccagggtgggtgagaactgacatggcgggacccaaggcc C C P G W V R T D M A G P K A 814 accaagagcccagaagaaggtgcagagacccctgtgtacttggcc T K S P E E G A E T P V Y L A 859 cttttgcccccagatgctgagggtccccatggacaatttgtttca L L P P D A E G P H G Q F V S 904 gagaagagagttgaacagtggtga 927 E K R V E Q W *

Centre for Bioinformatics(63)

Page 64: Bio Informatics

4.2 SPLICE SITE PREDICTIONNetGene2

The sequence: Sequence has the following composition:

Length: 1209 nucleotides.25.0% A, 25.3% C, 28.9% G, 20.8% T, 0.0% X, 54.2% G+C

Donor splice sites, direct strand--------------------------------- pos 5'->3' phase strand confidence 5' exon intron 3' 780 2 + 0.35 GCTGCCCAGG^GTGGGTGAGA 833 1 + 0.36 CCAGAAGAAG^GTGCAGAGAC 924 2 + 0.80 TTGAACAGTG^GTGAGCTGGG

Donor splice sites, complement strand------------------------------------- pos 3'->5' pos 5'->3' phase strand confidence 5' exon intron 3' 1055 155 2 - 0.41 TAGTACATTA^GTGAGTGCTA 966 244 1 - 0.46 TCAGGACAAG^GTACAAAATG 200 1010 1 - 0.39 GTCCCGCGCC^GTGAGCACCA

Acceptor splice sites, direct strand------------------------------------ pos 5'->3' phase strand confidence 5' intron exon 3' 69 0 + 0.00 CTCCACGCAG^GTGTTCCGCG 115 1 + 0.18 ATCCATGTAG^CGCTGGTGAC 512 2 + 0.19 ACGTATCTAG^CATCATGAGC 521 2 + 0.19 GCATCATGAG^CGTCAGAGCC 527 2 + 0.19 TGAGCGTCAG^AGCCCTTAAA 529 1 + 0.19 AGCGTCAGAG^CCCTTAAAAG 539 2 + 0.19 CCCTTAAAAG^CTGCAGCCCA 545 2 + 0.19 AAAGCTGCAG^CCCAGAGCTG 550 1 + 0.18 TGCAGCCCAG^AGCTGCAGCA 552 0 + 0.18 CAGCCCAGAG^CTGCAGCAGA 558 0 + 0.07 AGAGCTGCAG^CAGAAGTTCC 871 1 + 0.33 TTGCCCCCAG^ATGCTGAGGG

Acceptor splice sites, complement strand---------------------------------------- pos 3'->5' pos 5'->3' phase strand confidence 5' intron exon 3' 771 439 1 - 0.07 CCCTGGGCAG^CAGGCATTCA 768 442 1 - 0.17 TGGGCAGCAG^GCATTCAGGA 760 450 0 - 0.18 AGGCATTCAG^GAGGATCTTG 757 453 0 - 0.19 CATTCAGGAG^GATCTTGTCC 727 483 2 - 0.76 GCTCACTCAG^TTTCCTGGCG 703 507 2 - 0.17 TCCTGGACAG^AACGGTGACG 584 626 1 - 0.33 TCCTCCTCAG^TGATGGTCTC 438 772 1 - 0.07 GGTACCAAAG^AAATTTGTTT 413 797 2 - 0.92 GTCACTTCAG^CTTGAATATG 399 811 1 - 0.34 AATATGAAAG^GGTGTGGGAT 386 824 2 - 0.19 GTGGGATCAG^CAACCTTGAA 375 835 1 - 0.18 AACCTTGAAG^GCGATGCCCG 125 1085 1 - 0.16 TTGCCTCCAG^TCACCAGCGC

------------------------------------------------------------------------------

CUTOFF values used for confidence:

Highly confident donor sites (H): 95.0 %Nearly all true donor sites: 50.0 %

Highly confident acceptor sites (H): 95.0 %Nearly all true acceptor sites: 20.0 %

Centre for Bioinformatics(64)

Page 65: Bio Informatics

Graphics showing the prediction output

Direct strand ( + strand)

Complement strand ( - strand)

Centre for Bioinformatics(65)

Page 66: Bio Informatics

4.3 GENE FINDERGEN SCAN

GENSCAN 1.0 Date run: 13-Jun-106 Time: 05:17:52

Sequence 05:17:52 : 1209 bp : 54.18% C+G : Isochore 3 (51 - 57 C+G%)

Parameter matrix: HumanIso.smat

Predicted genes/exons:

Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------

1.01 Term + 70 927 858 0 0 113 49 1199 0.529 111.73 1.02 PlyA + 1188 1193 6 1.05

Predicted peptide sequence(s):

>05:17:52|GENSCAN_predicted_peptide_1|285_aaVFRAPRSAMSSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW

Explanation

Gn.Ex : gene number, exon number (for reference)Type : Init = Initial exon (ATG to 5' splice site) Intr = Internal exon (3' splice site to 5' splice site) Term = Terminal exon (3' splice site to stop codon) Sngl = Single-exon gene (ATG to stop) Prom = Promoter (TATA box / initation site) PlyA = poly-A signal (consensus: AATAAA)S : DNA strand (+ = input strand; - = opposite strand)Begin : beginning of exon or signal (numbered on input strand)End : end point of exon or signal (numbered on input strand)Len : length of exon or signal (bp)Fr : reading frame (a forward strand codon ending at x has frame x mod 3)Ph : net phase of exon (exon length modulo 3)I/Ac : initiation signal or 3' splice site score (tenth bit units)Do/T : 5' splice site or termination signal score (tenth bit units)CodRg : coding region score (tenth bit units)P : probability of exon (sum over all parses containing exon)Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)

Comments

The SCORE of a predicted feature (e.g., exon or splice site) is alog-odds measure of the quality of the feature based on local sequenceproperties. For example, a predicted 5' splice site withscore > 100 is strong; 50-100 is moderate; 0-50 is weak; andbelow 0 is poor (more than likely not a real donor site).

The PROBABILITY of a predicted exon is the estimated probability underGENSCAN's model of genomic sequence structure that the exon is correct.This probability depends in general on global as well as local sequenceproperties, e.g., it depends on how well the exon fits with neighboringexons. It has been shown that predicted exons with higher probabilitiesare more likely to be correct than those with lower probabilities

Centre for Bioinformatics(66)

Page 67: Bio Informatics

4.4 RESTRICTION MAPPINGNEBcutter

Display: - NEB single cutter restriction enzymes- Main non-overlapping, min. 100 aa

ORFs

GC=54%, AT=46%

Centre for Bioinformatics(67)

Page 68: Bio Informatics

DISCUSSIONA. Database retrieval

Expasy Primary accession number is P16152 Protein length 276 aa. Synonyms for Carbonyl

reductase [NADPH1] are NADPH-dependent carbonyl reductase 1,

Prostaglandin-E(2)9-reductase,Prostaglandin9-ketoreductase,15-hydroxyprostaglandin

dehydrogenase[NADP+]

FUNCTION: : Catalyzes the reduction of a wide variety of carbonyl compounds including the

antitumor anthracycline antibiotics. Can convert prostaglandin E2 to prostaglandin F2-alpha.

CATALYTIC ACTIVITY : 1) R-CHOH-R' + NADP+ = R-CO-R' + NADPH. 2) (13E)-(15S)-11-

alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (13E)-11-alpha-hydroxy-9,15-dioxoprost-

13-enoate + NADPH.

3) (5Z,13E)-(15S)-11-alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (5Z,13E)-11-alpha-

hydroxy-9,15-dioxoprost-13-enoate + NADPH

According to NCBI and EMBL the nucleotide sequence length is 1209 bp. cDNA starts from 94

to 927 bp. Protein ID is J04056.

PDB ID is 1WMA. The structural name according to PDB is “Hydroxy-PP”. Molecular

Description is Carbonyl reductase [NADPH] 1 .  Functional Class is Oxidoreductase.

Molecular Function: oxidoreductase activity.

According to ENZYME database EC 1.1.1.184 . R-CHOH-R' + NADP(+) <=> R-CO-R' + NADPH

The approved HGNC symbol for this gene is CBR1

According to Gencard gene location is at Chromosome21. Location: 21q22.13.It starts from

36,364,191 bp to 36,367,332 bp from pter. Size of the gene is 3141 bases. Orientation of the

gene is plus strand.

Gen atlas reported that this enzyme is   found in intracellular,cytoplasm,cytosolic.

According to ENSEMBL result this gene can be found on Chromosome 21 at location 36,364,191-

36,367,332.It has only one transcription site. The start of this gene is located in Contig

AP000688.1.1.171703.

Based on Pfam result This family is a member of the FAD/NAD(P)-binding Rossmann fold Superfamily clan.

This clan includes the following Pfam members: Trp_halogenase; TrkA_N. The short-chain

dehydrogenases/reductases family (SDR) is a very large family of enzymes, most of which are

known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to

Centre for Bioinformatics(68)

Page 69: Bio Informatics

be characterized was Drosophila alcohol dehydrogenase, this family used to be called 'insect-type',

or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to

300 amino acid residues. Most dehydrogenases possess at least 2 domains,the first binding the

coenzyme, often NAD, and the second binding the substrate. This latter domain determines the

substrate specificity and contains amino acids involved in catalysis. Little sequence similarity

has been found in the coenzyme binding domain although there is a large degree of structural

similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen

through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate

specific domains.

B.Tools and techniques

1.Homology and similarity

In Blast similarity searches similar kinds of sequences are found in Musmusculus, Rat and

Rabbit with similarity of 92%, 92% and 90% respectively. In Fasta similarity searches similar

kinds of sequences are found in Bovine, Rat and Rabbit with similarity of 95.652%, 94.565%

and 94.834% respectively.

Comparison between DHCA_HUMAN and DCXR_HUMAN was carried out by using

EMBOSS tool. The results revealed that similarity in global alignment is 34.1% and local

alignment is 38.4%.

Multiple sequence alignment based on ClustalW revealed that the Alignment Score is 14258 .

The Alignment Score in Tcoffee is 81. “*” represents matches and “.” Represents mismatches.

2. Functional analysis

Scanprosite results revealed that Short-chain dehydrogenases/reductases family signature are

found in 180-208. The short-chain dehydrogenases/reductases family (SDR) [1] is a very large

family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases.

As the first member of this family to be characterized was Drosophila alcohol dehydrogenase,

this family used to be called [2,3,4] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most

member of this family is proteins of about 250 to 300 amino acid residues. The proteins currently

known to belong to this family are listed below.

Alcohol dehydrogenase (EC 1.1.1.1) from insects such as Drosophila.

Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene budC).

D-β-hydroxybutyrate dehydrogenase (BDH) (EC 1.1.1.30) from mammals.

Centre for Bioinformatics(69)

Page 70: Bio Informatics

The graphical representation of these pattern has been shown in interpro result.

In block result5 possible hits reported.six blocks are present in Glucose/ribitol

dehydrogenase family signature.

>IPB002347 6/6 blocks Combined E-value= 3e-36: Glucose/ribitol dehydrogenase family

signatureBlock Frame Location (aa) Block E-value

IPB002347A 0 6-23 2e-07

IPB002347B 0 81-92 1.5e-06

IPB002347C 0 126-142 0.0028

IPB002347D 0 193-212 0.034

IPB002347E 0 218-235 0.00017

IPB002347F 0 236-256 0.074

The graphical representation of profiles in smart result confirmed the presence of Carbonyl

reductase [NADPH] 1.

According to meme result three motifs are present in Carbonyl reductase [NADPH] 1. The

motif details with respect to these three motif regions were collcted from meme result and has

been submitted in mast. These motif were found in some other enzymes having same domains.

3. Structural analysis

Secondary structure prediction has been done by using GOR IV, Sopma and PHD. Prediction

based on these tools revealed that % of alpha helix region is in between 28.26% to 41.30% and

% of beta sheet is in between 18.84 to 29.35. This results are confirmed by NNpredict, Jpred and

sopma results.

According to sequence details from PDB database eight beta strands, eleven helix and nine turns

are present in Carbonyl reductase [NADPH] 1.

4. Sequence analysis

Based on ORF prediction the coding region starts from 94 to 927. Netgene2 predicted three

donor splice sites and twelve acceptor splice site in direct strand. It predicted three donor splice

sites and thirteen acceptor splice site in complement strand.

Genscan result revealed the presence of Single-exon gene (ATG to stop) from 94 to 927.polyA

tails stars from 1188-1193.

According to NEB cutter restriction prediction tool “GC” is 54% and “AT” is 46%. Recognition

site for BssHII is found in N terminal. Recognition site for PspOMI1 is found in C terminal.

Centre for Bioinformatics(70)

Page 71: Bio Informatics

CONCLUSIONS

Results from different data bases revealed that Carbonyl reductase play a role in which Catalyzes

the reduction of a wide variety of carbonyl compounds including the antitumor anthracycline

antibiotics. Can convert prostaglandin E2 to prostaglandin F2-alpha.Gene location is at

Chromosome 21. It starts from 36,364,191m pter to 36,367,737 bp from pter. Size of the gene is

3141 bases. Orientation of the gene is plus strand. Protein length 276 aa. The approved HGNC

symbol for this gene is CBR1. Functional Class is Oxidoreductase. Reaction catalyzed= R-

CHOH-R' + NADP(+) <=> R-CO-R' + NADPH.

Sequence, Structure and functional analysis of Carbonyl reductase described that similar kinds

of sequences are found in Musmusculus, Rat and Rabbit with similarity of 92%, 92% and 90%

respectively(based on Blast result). In Fasta similarity searches similar kinds of sequences are

found in Bovine, Rat and Rabbit with similarity of 95.652%, 94.565% and 94.834% respectively.

Short-chain dehydrogenases/reductases family signature are found in 180-208. The short-chain

dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which

are known to be NAD- or NADP-dependent oxidoreductases. Eight beta strands, eleven helix

and nine turns are present in Carbonyl reductase [NADPH]1. Genscan result revealed the

presence of Single-exon gene (ATG to stop) from 94 to 927. “GC” is 54% and “AT” is 46%.

Centre for Bioinformatics(71)

Page 72: Bio Informatics

REFERENCES

1. Avramopoulos, D.; Cox, T.; Forrest, G. L.; Chakravarti, A.;

Antonarakis, S. E. :

Linkage mapping of the carbonyl reductase (CBR) gene on human

chromosome 21 using a DNA polymorphism in the 3-prime

untranslated region. Genomics 13: 447-448, 1992.

2. Forrest, G. L.; Akman, S.; Krutzik, S.; Paxton, R. J.; Sparkes, R.

S.; Doroshow, J.; Felsted, R. L.; Glover, C. J.; Mohandas, T.;

Bachur, N. R. :

Induction of a human carbonyl reductase gene located on

chromosome 21. Biochim. Biophys. Acta 1048: 149-155, 1990.

3. Lemieux, N.; Malfoy, B.; Forrest, G. L. :

Human carbonyl reductase (CBR) localized to band 21q22.1 by high-

resolution fluorescence in situ hybridization displays gene

dosage effects in trisomy 21 cells. Genomics 15: 169-172, 1993.

4. Watanabe, K.; Sugawara, C.; Ono, A.; Fukuzumi, Y.; Itakura, S.;

Yamazaki, M.; Tashiro, H.; Osoegawa, K.; Soeda, E.; Nomura, T. :

Mapping of a novel human carbonyl reductase, CBR3, and ribosomal

pseudogenes to human chromosome 21q22.2. Genomics 52: 95-100,

1998.

5. Wei, J.; Dlouhy, S. R.; Hara, A.; Ghetti, B.; Hodes, M. E. :

Cloning a cDNA for carbonyl reductase (Cbr) from mouse

cerebellum: murine genes that express Cbr map to chromosomes 16

and 11. Genomics 34: 147-148, 1996.

6. Wermuth, B.; Bohren, K. M.; Heinemann, G.; von Wartburg, J.-P.;

Gabbay, K. H. :

Human carbonyl reductase: nucleotide sequence analysis of a cDNA

and amino acid sequence of the encoded protein. J. Biol. Chem.

263: 16185-16188, 1988.

Centre for Bioinformatics(72)

Page 73: Bio Informatics

Centre for Bioinformatics(73)