Upload
harsha-gupta
View
14
Download
0
Embed Size (px)
DESCRIPTION
Protein structural analysis
Citation preview
INTRODUCTION
Gene map locus 21q22.12
Carbonyl reductase (EC 1.1.1.184) is 1 of several monomeric, NADPH-
dependent oxidoreductases having wide specificity for carbonyl
compounds that are generally referred to as aldoketoreductases. Others
include aldehyde reductase (EC 1.1.1.2; 103830) and aldose reductase (EC
1.1.1.21; 103880). Wermuth et al. (1988) isolated and characterized a cDNA
complementary to carbonyl reductase mRNA from a human placenta cDNA
library. The cDNA contained an open reading frame encoding a protein
comprised of 277 amino acids with a molecular weight of 30,375.
Comparison of the predicted protein sequence with the primary
structures of other aldoketoreductases showed no significant
homologies. A possible homology, on the other hand, was found between
carbonyl reductase and 'short' subunit alcohol/polyol dehydrogenases.
Carbonyl reductase catalyzes the reduction of a great variety of
carbonyl compounds, e.g., quinones derived from polycyclic aromatic
hydrocarbons, 9-ketoprostaglandins, and the antitumor anthracycline
antibiotics daunorubicin and doxorubicin. The enzyme is widely
distributed in human tissues and also occurs in other mammalian and
nonmammalian species.
In a carbonyl reductase cDNA cloned from a breast cancer cell line,
Forrest et al. (1990) demonstrated 1,219 basepairs. Southern analysis of
genomic DNA digested with several restriction enzymes and analyzed by
hybridization with a labeled cDNA probe indicated that carbonyl
reductase is probably coded by a single gene and does not belong to a
family of structurally similar enzymes. Southern analysis of 17
mouse/human somatic cell hybrids showed that carbonyl reductase is
located on chromosome 21. Carbonyl reductase mRNA was induced 3- or 4-
fold in 24 hours with BHA, beta-naphthoflavone, or Sudan 1. Avramopoulos
et al. (1992) confirmed assignment to chromosome 21 by genetic linkage
Centre for Bioinformatics(1)
mapping using a DNA polymorphism from the 3-prime untranslated region
of the CBR gene. They demonstrated, furthermore, that the gene lies
between that for interferon-alpha receptor (107450) and D21S55, being
about 3.4 and 7.2 cM, respectively, from the 2 flanking loci. The
findings placed CBR in the telomeric band 21q22.3. By high-resolution
fluorescence in situ hybridization, Lemieux et al. (1993) mapped the CBR
gene to 21q22.12, very close to the SOD1 locus at position 21q22.11.
CBR displayed gene dosage effects in trisomy 21 human lymphoblasts at
both the DNA and the mRNA levels. With increasing chromosome 21
ploidy, lymphoblasts also showed increased aldo-keto reductase
activity and increased quinone reductase activity. Both of these
activities have been shown to be associated with carbonyl reductase.
The location of CBR near SOD1 and the increased enzyme activity and
potential for free radical modulation in trisomy 21 cells implicate
CBR as a candidate for contributing to the pathology of Down syndrome.
Centre for Bioinformatics(2)
STRUCTURE OF PROTEIN
Centre for Bioinformatics(3)
METHODOLOGY
As information on the web is huge, there are numerous search engines to aid in formation search. Several search engines like google, Altavista, Infoseek, Hotbot etc. are widely used by the internet users.
A. DATABASE RETRIEVAL
RETRIEVAL OF INFORMATION FROM EXPASY
Type www. expasy .org in the address column Or
Go to google com and type Expasy. Select proteomic server link and click it. In Proteomic server page enter your protein in the top side box and click GO. In result page select your protein by carefully reading the specification and name. Click your
protein ID. Save your Expasy result page and note the protein ID. Hereafter you can type protein ID directly
in topside box for collecting your Expasy page. Go through the expasy result page .you can find links for all databases under the heading “Cross
references”
Model Expasy page “Cross-references”
Cross-referencesSequence databases
EMBL
Y00970; CAA68784.1; -; mRNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
X54017; CAA37964.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
X54018; CAA37964.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]
X54019; CAA37964.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]
X54020; CAA37964.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]
M77378; AAA51572.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
M77379; AAA51573.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
M77380; AAA51574.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
M77381; AAA51575.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
X66188; CAA46956.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
X54018; CAA46956.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]
X54019; CAA46956.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]
X54020; CAA46956.1; JOINED; Genomic_DNA.[EMBL / GenBank / DDBJ] [CoDingSequence]
PIR S11674; S11674.
UniGeneHs.370870
Centre for Bioinformatics(4)
3D structure databasesHSSP P08001; 1FIZ. [HSSP ENTRY / PDB]SMR P10323; 43-301.ModBase P10323.Protein-protein interaction databasesDIP P10323.Protein family/group databasesMEROPS S01.223; -.2D gel databasesSWISS-2DPAGE
Get region on 2D PAGE.
Organism-specific gene databasesHGNC HGNC:126; ACR.GeneCards ACR.GeneLynx ACR; Homo sapiens.GenAtlas ACR.MIM 102480; gene. [NCBI / EBI]HOVERGEN [Family / Alignment / Tree]Gene expression databasesCleanEx HGNC:126; ACR.Ontologies
GO
GO:0004284; Molecular function: acrosin activity (traceable author statement).GO:0005515; Molecular function: protein binding (inferred from physical interaction).GO:0007340; Biological process: acrosome reaction (traceable author statement).QuickGo view.
Family and domain databases
InterPro
IPR012267; Pept_S1A_acrosin.IPR009003; Pept_Ser_Cys.IPR001254; Peptidase_S1_S6.IPR001314; Peptidase_S1A.Graphical view of domain structure.
PfamPF00089; Trypsin; 1.Pfam graphical view of domain structure.
PIRSF PIRSF001141; Acrosin; 1.PRINTS PR00722; CHYMOTRYPSIN.
SMARTSM00020; Tryp_SPc; 1.SMART graphical view of domain structure.
PROSITE
PS50240; TRYPSIN_DOM; 1.PS00134; TRYPSIN_HIS; 1.PS00135; TRYPSIN_SER; 1.PROSITE graphical view of domain structure (profiles).
ProDom [Domain structure / List of seq. sharing at least 1 domain]BLOCKS P10323.Genome annotation databasesEnsembl ENSG00000100312; Homo sapiens. [Contig view]OtherSOURCE ACR; Homo sapiens.ProtoNet P10323.UniRef View cluster of proteins with at least 50% / 90% / 100% identity.
RETRIEVAL OF SEQUENCE FROM NCBI Collection of Nucleotide sequence from NCBI
Go to Expasy page and right click First “GENBANK” link under Cross references and open it in
new window.
Within result page click the “go” icon in the top side box.
Once again click the ID for getting Genbank page and Save it.
Go to display and select Fasta.
Centre for Bioinformatics(5)
Select the FASTA FORMAT, copy from operator sign, paste it in a new NOTEPAD window and
save it under the file name NUCLEOTIDE FASTA.
Collection of protein sequence from NCBI
Goto genbank webpage and click protein ID link.
Save the page as genpept format.
RETRIEVAL OF SEQUENCE FROM EMBL EMBL : European Molecular Biology Laboratory (http://www.embl-heidelberg.de/) is a widely used site for
information retrieval.. It supports various databases. EMBL is hence an integrated information retrieval
platform allowing the user to seamlessly access the databases.
Collection of nucleotide sequence from EMBL
Go to Expasy page and right click first “EMBL” link under Cross references and open it in new
window.
Within result page copy ID number and paste it inside top side box and click “GO”.
You can get entry page with series of sequence ID having hyperlinks (mostly on the basis of
proximity to your protein name).Look for your protein and follow the hyperlink by clicking on it.
Save it as EMBL format.
Collection of protein sequence EMBL
Go to Embl webpage and click protein ID link.
Save the page as Uniprot format.
RETRIEVAL OF INFORMATION FROM OTHER DATABASES Right click the PDB link under the 3D structural database, open it in a new window and save it.
Similarly open ENZYME, HGNC, GeneCards, GenAtlas, MIM, PFam and ENSEMBL web pages in
new windows by right clicking the ID numbers in the right hand side and save webpages. Copy
HGNC page and paste it in word document.
Right click the fasta format link in the bottom side right hand corner of Expasy page and open it
in new window. Copy the Fasta format and paste it in notepad. Save it as fasta protein.
Centre for Bioinformatics(6)
TOOLS AND TECHNIQUES
1. HOMOLOGY AND SIMILARITY TOOLS
1.1 PAIR WISE SEQUENCE ANALYSIS
1.1.1 BLAST
Click the link http\\www.ncbi.nlm.nih.gov
Click the BLAST link.
Click the protein-protein query
Submit protein sequence in FASTA format in submission box.
Run Blast.
In result page click “FORMAT” icon.
Save web page as BLAST RESULT.
1.1.2 FASTA
Click the link http\\sbr.ebi.ac.uk
Click the button TOOLS
Click homology and similarity.
Select FASTA
Open FASTA submission form
(Or)
Go to google.com.Type Fasta. Click the link Fasta similarity searching against protein databases .
Paste the protein sequence in FASTA format.
Run FASTA.
In the result page click “Fasta result” Icon
Save the web page as FASTA RESULT.
1.1.3 EMBOSS ALIGN
Type http://www.ebi.ac.uk/emboss/align in the address column
Centre for Bioinformatics(7)
(OR)
Type “Emboss align” in google.com and select the link Pairwise Alignment algorithms form .
Within submission form select “Needle” and submit fasta protein sequence in First box.
In second box submit one more sequence for comparison and click “RUN”
Within submission form select “Water” and submit fasta protein sequence in First box.
In second box submit one more sequence for comparison and click “RUN”
Save the results.
1 .2 MULTIPLE SEQUENCE ALIGNMENT
1.2.1 CLISTALW
Type www.ebi.ac.uk/ clustalw
(OR)
Type “ClustalW” in google.com. Select the link “ClustalW”.
Submit multiple protein sequences in fasta format and click “Run” icon.
In the result page click “Show Color” and save the result page.
1.2.2 T-COFFEE
Type http://www.ch.embnet.org/software/TCoffee.html in the address box.
(OR)
Type T-Coffee in google.com. Select the link T-COFFEE server.
Submit multiple protein sequences in fasta format and click “Run T-Coffee” icon. Save the
result web page.
2. FUNCTIONAL ANALYSIS TOOLS
2.1 PATTERN SEARCH
2.1.1 SCANPROSITE
Type http://www.expasy.org/tools/scanprosite in the address column.
(OR)
Type T-Coffee in google.com. Select the link ExPASy - ScanProsite .
Submit your protein sequence in fasta format. Click “Start the scan” icon.
Save the result. Click prosite ID link (blue color hyperlink) and documentation page.
2.1.2 INTERPRO
Type http://www.ebi.ac.uk/InterProScan in the address column.
Centre for Bioinformatics(8)
(OR)
Type “Interproscan” in google.com and select the link InterProScan.
Submit protein sequence in fasta format. Click “Submit Job” icon and Save the result.
2.1.3 BLOCK
Type http://bioinformatics.weizmann.ac.il/blocks/blocks_search.html in the address column.
(OR)
Type “Block Server” in google.com and select the link Block Search .
Submit protein sequence in fasta format. Click “Perform Search” icon and Save the result.
2.1.4 SMART
Type http://smart.embl-heidelberg.de in the address column.
(OR)
Type “Smart” in google.com and select the link SMART : Main page .
Select “Normal mode” and Submit protein sequence in fasta format.
Click “Sequence Smart” icon. Copy the webpage and paste it in word document.
2.2 MOTIF SEARCH
2.2.1 MEME
Type http://meme.sdsc.edu/meme/meme.html in the address column.
(OR)
Type “MEME” in google.com and select the link MEME - Submission form
Enter your E-mail (compulsory) and Submit multiple protein sequences in fasta format. Click
“Start Search”. Go to your mail box and collect your meme result.
2.2.2 MAST
Open one Notepad and type ALPHABET= ACDEFGHIKLMNPQRSTVWY
In Meme result page go through result. Under the heading Motif 1 position-specific scoring matrix
click “ViewPSSM1” icon. Collect motif1 details and paste it in the same notepad. Similarly
under the heading Motif 2 position-specific scoring matrix and Motif 3 position-specific scoring matrix click
“viewPSSM2” and “ViewPSSM3” icons and paste the details in the same notepad. Save the
notepad as “Motif Details”
Type http://meme.sdsc.edu/meme/mast.html in the address column.
(OR)
Type “MEME” in google.com and select the link MEME - Submission form
Centre for Bioinformatics(9)
Under “menu” click “submit a job” and select mast.
In Mast submission form enter your E-mail (compulsory).
Browse “Motif Details” notepad file through submission form.
Select “Swissprot” database under Mast database column.
Click “Start Search”. Go to your mail box and collect your meme result.
3. STRUCTURAL ANALYSIS TOOLS
3.1 SECONDARY STRUCRUTE PREDICTION
3.1.1 GOR IV
Type http://npsa-pbil.ibcp.fr in the address column
(OR)
Type “npsa” in google.com and select the link npsa-pbil.ibcp.fr/
Under Secondary structure prediction click GOR IV.
Submit raw protein sequence. Click “Submit” icon.
Copy the result and paste it in word document.
3.1.2 SOPMA
Type http://npsa-pbil.ibcp.fr in the address column
(OR)
Type “npsa” in google.com and select the link npsa-pbil.ibcp.fr/
Under Secondary structure prediction click SOPMA.
Submit raw protein sequence. Click “Submit” icon.
Copy the result and paste it in word document.
3.1.3 PHD
Type http://npsa-pbil.ibcp.fr in the address column
(OR)
Type “npsa” in google.com and select the link npsa-pbil.ibcp.fr/
Under Secondary structure prediction click PHD.
Submit raw protein sequence. Click “Submit” icon.
Copy the result and paste it in word document.
3.1.4 NN PREDICT
Type http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html in the address column
(OR)
Centre for Bioinformatics(10)
Type “nnpredict” in google.com and select the link nnpredict input form .
Submit raw protein sequence. Click “Submit” icon.
Save the result.
3.1.5 JPRED
Type http://www.compbio.dundee.ac.uk/~www-jpred/submit.html in the address column
(OR)
Type “JPRED” in google.com and select the link Jpred submission form
Enter your e-mail(Compulsory).Submit raw protein sequence.
Enable the check box in the 4th column by clicking the check box.
Click “Run secondary structure prediction” icon.
Wait until you get the result page and save the page.
3.1.6 PREDTICT PROTEIN
Type http://www.predictprotein.org/meta in the address column
(OR)
Type “predictprotein server” in google.com and select the link META II - PredictProtein server
Enter your e-mail (Compulsory).Submit raw protein sequence.
Enable JPRED, PHD, PROF and PSIPRED under the heading Protein structure.
Click “Submit /Run prediction” icon.
Go to your mailbox and collect the result by clicking the link.
Save the result page.
3.2 PROTEIN VISUALISATION TOOL
3.2.1 RASMOL
Type http://www.rcsb.org/pdb/Welcome.do in the address column
In PDB page enter protein PDB ID (For example “1BAK”) and click “search”.
In result page click all images under the heading “DISPLAY OPTIONS”
Under the heading “Display Molecule” in left-hand side of the webpage Click Rasmol Viewer
link and download PDB file. Save it in desktop.
Open PDB file through Rasmol window.
Go to display mode and select Ball and Stick, Cartoon, Strands and Space fill one by one.
Copy protein structure under different modes and paste it in word document
Centre for Bioinformatics(11)
4. SEQUENCE ANALYSIS TOOLS
4.1 ORF PREDICTION TOOL
ORF FINDER (NCBI)
Type http://www.ncbi.nlm.nih.gov/projects/gorf/in the address column
(OR)
Type “NCBI ORF Finder” in google.com and select the link ORF Finder
Submit raw nucleotide sequence and click OrfFind icon.
In result page click first ORF graphical picture.
Copy the result and paste it in word document.
4.2 SPLICE SITE PREDICTION TOOL
NETGENE2
Type http://www.cbs.dtu.dk/services/NetGene2/ in the address column
(OR)
Type “Netgene2” in google.com and select the link NetGene2 Server
Select organism name. Submit raw nucleotide sequence
Click “Send file” icon and save the result.
4.3 GENE PREDICTION TOOL
GENSCAN
Type http://genes.mit.edu/GENSCAN.html in the address column
(OR)
Type “Genscan” in google.com and select the link New GENSCAN Web Server at MIT
Select organism name. Submit raw nucleotide sequence
Click “Run GENSCAN” icon and save the result.
4.4 RESTRICTION MAPPING NEB CUTTER
Type http://tools.neb.com/NEBcutter2/index.php in the address column
(OR)
Type “NEBcutter” in google.com and select the link NEBcutter V2.0
Submit raw nucleotide sequence. Click “Submit” icon and save the result.
Centre for Bioinformatics(12)
RESULTSA.DATA BASE RETRIEVALExpasy
UniProtKB/Swiss-Prot entry P16152
Entry informationEntry name DHCA_HUMANPrimary accession number P16152Secondary accession numbers NoneIntegrated into Swiss-Prot on April 1, 1990Sequence was last modified on April 1, 1993 (Sequence version 2)
Annotations were last modified on March 7, 2006 (Entry version 66)Name and origin of the proteinProtein name Carbonyl reductase [NADPH] 1Synonyms EC 1.1.1.184
NADPH-dependent carbonyl reductase 1Prostaglandin-E(2) 9-reductaseEC 1.1.1.189Prostaglandin 9-ketoreductase15-hydroxyprostaglandin dehydrogenase [NADP+]EC 1.1.1.197
Gene name Name: CBR1Synonyms: CBR, CRN
FromHomo sapiens (Human)
[TaxID: 9606]
Taxonomy Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.
References[1] NUCLEOTIDE SEQUENCE [MRNA], AND PARTIAL PROTEIN SEQUENCE.
TISSUE=Placenta;PubMed=3141401 [NCBI, ExPASy, EBI, Israel, Japan]Wermuth B., Bohren K.M., Heinemann G., von Wartburg J.-P., Gabbay K.H.;"Human carbonyl reductase. Nucleotide sequence analysis of a cDNA and amino acid sequence of the encoded protein.";J. Biol. Chem. 263:16185-16188(1988).
[2] NUCLEOTIDE SEQUENCE, AND PARTIAL PROTEIN SEQUENCE.TISSUE=Mammary gland;DOI=10.1016/0167-4781(90)90050-C; PubMed=2182121 [NCBI, ExPASy, EBI, Israel, Japan]Forrest G.L., Akman S., Krutzik S., Paxton R.J., Sparkes R.S., Doroshow J., Felsted R.L., Mohandas T., Bachur N.R.;"Induction of a human carbonyl reductase gene located on chromosome 21.";Biochim. Biophys. Acta 1048:149-155(1990).
Comments FUNCTION : Catalyzes the reduction of a wide variety of carbonyl compounds including the antitumor anthracycline antibiotics. Can
convert prostaglandin E2 to prostaglandin F2-alpha. CATALYTIC ACTIVITY : R-CHOH-R' + NADP+ = R-CO-R' + NADPH. CATALYTIC ACTIVITY : (13E)-(15S)-11-alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (13E)-11-alpha-hydroxy-9,15-
dioxoprost-13-enoate + NADPH. CATALYTIC ACTIVITY : (5Z,13E)-(15S)-11-alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (5Z,13E)-11-alpha-hydroxy-9,15-
dioxoprost-13-enoate + NADPH. SUBUNIT : Monomer. SUBCELLULAR LOCATION : Cytoplasm.
SIMILARITY : Belongs to the short-chain dehydrogenases/reductases (SDR) family. Cross-referencesSequence databasesEMBL J04056; AAA52070.1; -; mRNA. [EMBL / GenBank / DDBJ]
[CoDingSequence]M62420; AAA17881.1; -; Unassigned_DNA.
[EMBL / GenBank / DDBJ] [CoDingSequence]
AB003151; BAA33498.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ] [CoDingSequence]
AP000688; BAA89424.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ]
Centre for Bioinformatics(13)
[CoDingSequence]BT019843; AAV38646.1; -; mRNA. [EMBL / GenBank / DDBJ]
[CoDingSequence]CR541708; CAG46509.1; -; mRNA. [EMBL / GenBank / DDBJ]
[CoDingSequence]AP001724; BAA95508.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ]
[CoDingSequence]BC002511; AAH02511.1; -; mRNA. [EMBL / GenBank / DDBJ]
[CoDingSequence]BC015640; AAH15640.1; -; mRNA. [EMBL / GenBank / DDBJ]
[CoDingSequence]PIR A61271; RDHUCB.3D structure databases
PDB1WMA; X-ray; A=1-276.
[ExPASy / RCSB / EBI]
ModBase P16152.Protein-protein interaction databasesIntAct P16152; -.DIP P16152.2D gel databasesSWISS-2DPAGE
Get region on 2D PAGE.
Organism-specific gene databasesH-InvDB HIX0016099; -.HGNC HGNC:1548; CBR1.GeneCards CBR1.GeneLynx CBR1; Homo sapiens.GenAtlas CBR1.MIM 114830; gene. [NCBI / EBI]HOVERGEN [Family / Alignment / Tree]Gene expression databasesCleanEx HGNC:1548; CBR1.Ontologies
GOGO:0004090; Molecular function: carbonyl reductase (NADPH) activity (traceable author statement).QuickGo view.
Family and domain databases
InterProIPR002347; ADH_short_C2.IPR002198; SDR.Graphical view of domain structure.
PANTHER PTHR19410; ADH_short; 2.
PfamPF00106; adh_short; 1.Pfam graphical view of domain structure.
PRINTSPR00081; GDHRDH.PR00080; SDRFAMILY.
PROSITE PS00061; ADH_SHORT; 1.ProDom [Domain structure / List of seq. sharing at least 1 domain]BLOCKS P16152.Genome annotation databasesEnsembl ENSG00000159228; Homo sapiens. [Contig view]OtherLinkHub P16152; -.SOURCE CBR1; Homo sapiens.ProtoNet P16152.UniRef View cluster of proteins with at least 50% / 90% / 100% identity.Keywords3D-structure; Acetylation; Direct protein sequencing ; NADP; Oxidoreductase. FeaturesFeature table viewerKey From To Length Description FTIdINIT_MET 0 0 CHAIN 1 276 276 Carbonyl reductase [NADPH] 1. PRO_0000054602NP_BIND 9 33 25 NADP (By similarity). ACT_SITE 193 193 Proton acceptor (By similarity). BINDING 139 139 Substrate (By similarity). MOD_RES 1 1 N-acetylserine. MOD_RES 238 238 N6-1-carboxyethyl lysine.
Centre for Bioinformatics(14)
STRAND 6 11 6 STRAND 13 14 2 HELIX 15 27 13 Sequence informationLength: 276 AA [This is the length of the unprocessed precursor]
Molecular weight: 30244 Da [This is the MW of the unprocessed precursor]
CRC64: 78E83065F5677733 [This is a checksum on the sequence]
10 20 30 40 50 60 SSGIHVALVT GGNKGIGLAI VRDLCRLFSG DVVLTARDVT RGQAAVQQLQ AEGLSPRFHQ
70 80 90 100 110 120 LDIDDLQSIR ALRDFLRKEY GGLDVLVNNA GIAFKVADPT PFHIQAEVTM KTNFFGTRDV
130 140 150 160 170 180 CTELLPLIKP QGRVVNVSSI MSVRALKSCS PELQQKFRSE TITEEELVGL MNKFVEDTKK
190 200 210 220 230 240 GVHQKEGWPS SAYGVTKIGV TVLSRIHARK LSEQRKGDKI LLNACCPGWV RTDMAGPKAT
250 260 270 KSPEEGAETP VYLALLPPDA EGPHGQFVSE KRVEQW
P16152 in FASTA format
Centre for Bioinformatics(15)
NCBI
PubMed Nucleotide Protein Genome Structure PMC Taxonomy OMIM BooksLOCUS HUMCRE 1209 bp mRNA linear PRI 01-NOV-1994DEFINITION Human carbonyl reductase mRNA, complete cds.ACCESSION J04056 X51818VERSION J04056.1 GI:181036KEYWORDS carbonyl reductase.SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 1209) AUTHORS Wermuth,B., Bohren,K.M., Heinemann,G., von Wartburg,J.P. and Gabbay,K.H. TITLE Human carbonyl reductase. Nucleotide sequence analysis of a cDNA and amino acid sequence of the encoded protein JOURNAL J. Biol. Chem. 263 (31), 16185-16188 (1988) PUBMED 3141401COMMENT Original source text: Human placenta, cDNA to mRNA, (library of Clontech). Draft entry and computer-readable sequence for [1] kindly provided by B.Wermuth, 31-AUG-1988.FEATURES Location/Qualifiers source 1..1209 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /map="21" gene 1..1209 /gene="CBR" CDS 94..927 /gene="CBR" /EC_number="1.1.1.184" /codon_start=1 /product="carbonyl reductase" /protein_id="AAA52070.1" /db_xref="GI:181037" /db_xref="GDB:G00-126-610" /translation="MSSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQ AAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTP FHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRS ETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKG DKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQ W" polyA_site 1209 /gene="CBR"ORIGIN 212 bp upstream of PstI site. 1 cagactcgag cagtctctgg aacacgctgc ggggctcccg ggcctgagcc aggtctgttc 61 tccacgcagg tgttccgcgc gccccgttca gccatgtcgt ccggcatcca tgtagcgctg 121 gtgactggag gcaacaaggg catcggcttg gccatcgtgc gcgacctgtg ccggctgttc 181 tcgggggacg tggtgctcac ggcgcgggac gtgacgcggg gccaggcggc cgtacagcag 241 ctgcaggcgg agggcctgag cccgcgcttc caccagctgg acatcgacga tctgcagagc 301 atccgcgccc tgcgcgactt cctgcgcaag gagtacgggg gcctggacgt gctggtcaac 361 aacgcgggca tcgccttcaa ggttgctgat cccacaccct ttcatattca agctgaagtg 421 acgatgaaaa caaatttctt tggtacccga gatgtgtgca cagaattact ccctctaata 481 aaaccccaag ggagagtggt gaacgtatct agcatcatga gcgtcagagc ccttaaaagc 541 tgcagcccag agctgcagca gaagttccgc agtgagacca tcactgagga ggagctggtg 601 gggctcatga acaagtttgt ggaggataca aagaagggag tgcaccagaa ggagggctgg 661 cccagcagcg catacggggt gacgaagatt ggcgtcaccg ttctgtccag gatccacgcc 721 aggaaactga gtgagcagag gaaaggggac aagatcctcc tgaatgcctg ctgcccaggg 781 tgggtgagaa ctgacatggc gggacccaag gccaccaaga gcccagaaga aggtgcagag 841 acccctgtgt acttggccct tttgccccca gatgctgagg gtccccatgg acaatttgtt 901 tcagagaaga gagttgaaca gtggtgagct gggctcacag ctccatccat gggccccatt 961 ttgtaccttg tcctgagttg gtccaaaggg catttacaat gtcataaata tccttatata 1021 agaaaaaaaa tgatctctta tcaattagca ctcactaatg tactactaat tgagcaacct 1081 acgcactcag ttgactacgt aaatctgtca ggtcttttgt gatttcctct gatgcaggag
Centre for Bioinformatics(16)
1141 aggaaaaatt gtaattgatg aaaataatga atgaaaatca acagatgaat aaatggttct 1201 ttataagtg
EMBLGeneral Information
Primary Accession # J04056
Accession # J04056 X51818
Entry Name EMBL:HSCRE
Molecule Type mRNA
Sequence Length 1209
Entry Division HUM
Sequence Version J04056.1
Creation Date 06-JUL-1989
Modification Date 17-APR-2005
Description
Description Human carbonyl reductase mRNA, complete cds.
Keywords carbonyl reductase.;
Organism Homo sapiens (human)
Organism Classification
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.
References
1. Werm
Centre for Bioinformatics(17)
uth,B.; Bohren,K.M.; Heinemann,G.; von Wartburg,J.P.; Gabbay,K.H.;
Human car
Centre for Bioinformatics(18)
bonyl reductase. Nucleotide sequence analysis of a cDNA and amino acid seq
Centre for Bioinformatics(19)
uence of the encoded protein
J. Biol. Chem. 263(31):16185-16188 (1988)
Pubmed 3141
Centre for Bioinformatics(20)
401
Position 1-1209
Forrest,G.L.; Akman,S.; Krutzik,S.; Paxton,R.J.; Sparkes,R.S.; Doroshow,J.; Felsted,R.L.; Glover,C.J.; Mohandas,T.; Bachur,N.R.; Induction of a human carbonyl reductase gene located on chromosome 21Biochim. Biophys. Acta 1048(2-3):149-155(1990).
DOI 10.1016/0167-4781(90)90050-C
Pubmed 2182121
Position 1-1209
GDB 1818
Centre for Bioinformatics(21)
39.
181840.
4571939.
Features
Key Location
source QualifierValue
1..1209 organism Homo sapiens
map 21
mol_type mRNA
db_xref taxon:9606
codon_start 1
gene CBR
product carbonyl reductase
EC_number 1.1.1.184
db_xref GDB:126610
db_xref GOA:P16152
db_xref HGNC:1548
db_xref HSSP:1N5D
db_xref InterPro:IPR002198
db_xref InterPro:IPR002347
db_xref PDB:1WMA
db_xref UniProtKB/Swiss-Prot:P16152
protein_id AAA52070.1
translation
Sequence
Characteristics Leng
Centre for Bioinformatics(22)
th: 1209 BP, A Count:302, C Count:306, G Count:349, T Count:252, Others Count:
Centre for Bioinformatics(23)
0
PDBTitle
Crystal structure of human CBR1 in complex with Hydroxy-PP
Authors Rauh, D., Bateman, R., Shokat, K.M.
Primary Citation
Tanaka, M., Bateman, R., Rauh, D., Vaisberg, E., Ramachandani,S., Zhang, C., Hansen, K.C., Burlingame, Shokat,K.M., Adams, C.L. An unbiased cell morphology-based screen for new, biologically active small molecules Plos Biol. v3 pp.128-128 , 2005
History Deposition 2004-07-06 Release 2005-04-26
Experimental Method
Type X-RAY DIFFRACTION
Centre for Bioinformatics(24)
Parameters
Resolution[Å] R-Value R-Free Space Group
1.24 0.129 (all) 0.167
P 21 21 21
Unit Cell
Length [Å] a 54.45 b 55.35 c 95.93 Angles [°] alpha 90.00 beta 90.00 gamma 90.00
Molecular Description Asymmetric
Unit
monomer (protein 276 residues)Polymer: 1 Molecule: Carbonyl reductase [NADPH] 1 Chains: A; EC No.: 1.1.1.184
Functional Class Oxidoreductase
Source Polymer: 1 Scientific Name: Homo sapiens Common Name: Human
system: Homo sapiens
Chemical Component
Identifier
Name Formula Drug Similarity
Ligand Structure
Ligand Interaction
SO4 SULFATE ION O4 S 2- [ View ] [ View ] [ View ]
PE5
3,6,9,12,15,18,21,24-OCTAOXAHEXACOSAN-1-OL
C18 H38 O9 [ View ] [ View ] [ View ]
P33
3,6,9,12,15,18-HEXAOXAICOSANE-1,20-DIOL
C14 H30 O8 [ View ] [ View ] [ View ]
NDP
NADPH DIHYDRO-NICOTINAMIDE-ADENINE-DINUCLEOTIDE PHOSPHATE
C21 H30 N7 O17 P3 [ View ] [ View ] [ View ]
AB3
3-(4-AMINO-1-TERT-BUTYL-1H-PYRAZOLO[3,4-D]PYRIMIDIN- 3-YL)PHENOL
C15 H17 N5 O [ View ] [ View ] [ View ]
GO TermsPolymer Molecular Function Biological Process Cellular Component Carbonyl reductase [NADPH] 1 (1WMA:A)
oxidoreductas e activity
metabolism none
Centre for Bioinformatics(25)
NiceZyme View of ENZYME: EC 1.1.1.184Official Name
Carbonyl reductase (NADPH).
Alternative Name(s)
Aldehyde reductase I.
NADPH-dependent carbonyl reductase.
Prostaglandin 9-ketoreductase.
Xenobiotic ketone reductase.
Reaction catalysed
R-CHOH-R' + NADP(+) <=> R-CO-R' + NADPH
Comment(s)
Acts on a wide range of carbonyl compounds, including quinones, aromatic aldehydes, ketoaldehydes, daunorubicin, and prostaglandins E and F, reducing them to the corresponding alcohol.
B-specific with respect to NADPH (cf. EC 1.1.1.2).
Cross-references
PROSITE PDOC00060
BRENDA 1.1.1.184
PUMA2 1.1.1.184
PRIAM enzyme-specific profiles 1.1.1.184
Kyoto University LIGAND chemical database
1.1.1.184
IUBMB Enzyme Nomenclature 1.1.1.184
IntEnz 1.1.1.184
Centre for Bioinformatics(26)
MEDLINE Find literature relating to 1.1.1.184
MetaCyc 1.1.1.184
UniProtKB/Swiss-Prot
Q21929, CBR2_CAEEL; P08074, CBR2_MOUSE; Q29529, CBR2_PIG; O75828, DHC3_HUMAN; P16152, DHCA_HUMAN; P48758, DHCA_MOUSE; Q28960, DHCA_PIG; Q5RCU5, DHCA_PONPY; P47844, DHCA_RABIT; P47727, DHCA_RAT; Q8SPU8, DHRS4_BOVIN; Q9BTZ2, DHRS4_HUMAN; Q99LB2, DHRS4_MOUSE; Q8WNV7, DHRS4_PIG; Q5RCF8, DHRS4_PONPY; Q9GKX2, DHRS4_RABIT; Q8VID1, DHRS4_RAT;
HGNCCore Data Database Links
Approved Symbol + CBR1 Enzyme IDs +
Approved Name + carbonyl reductase 1 1.1.1.184 Enz ID
HGNC ID + HGNC:1548 Pubmed IDs +
Status + Approved 8432528 PMID
Chromosome + 21q22.1 OMIM ID (mapped data) +
Previous Symbols + CBR 114830 OMIM
Previous Names + Entrez Gene ID (mapped data) +
Aliases + 873 Gene Map Viewer
RefSeq (mapped data) +
Gene Symbol Links NM_001757 GenBank UCSC Browser UCSC Index
Ensembl GeneView GENATLAS GeneCards GeneClinics/GeneTests Vega
UniProt ID (mapped data) +
P16152 SwissProt UniProt
Centre for Bioinformatics(27)
GENE CARDSChromosome: 21 Entrez Gene cytogenetic band: 21q22.13 Ensembl cytogenetic band: 21q22.12 Nature(405: 311-319) cytogenetic band: 21q22.13
Gene in genomic location: bands according to Ensembl, locations according to (and/or Entrez Gene and/or Ensembl if different)
GeneLoc gene densities for chromosome 21
(about GC identifiers) GC21P036364: GeneLoc Nature:405,311-319Start: 36,364,191 bp from pter 23,019,072 bp from centromereEnd: 36,367,332 bp from pter 23,022,213 bp from centromereSize: 3,141 bases 3,142 basesOrientation: plus strand plus strand
Centre for Bioinformatics(28)
GENATLAS
FLASH GENE
Symbol CBR1 last update : 30/4/2002
HGNC name carbonyl reductase 1
HGNC id 1548
Location 21q22.2
Synonym symbol(s) CRN
EC.number 1.1.1.184,1.1.1.89,1.1.1.197 DNA RNA EXP/sub-loc PROTEIN PATHOLOGY
DNA
TYPE functioning gene
STRUCTURE 3,1 kb 3 Exon(s)
10 Kb 5' upstream gene genomic sequence study
SUBCELLULAR LOCALIZATION intracellular,cytoplasm,cytosolic
Centre for Bioinformatics(29)
FAMILY short chain dehydrogenases/reductases (SDR) familyCATEGORY enzyme
basic FUNCTION carbonyl reductase NADPH-dependent oxidoreductase prostaglandin-E29 reductase,prostaglandin 9-ketoreductase
ENSEMBL
Genomic Location
This gene can be found on Chromosome 21 at location 36,364,191-36,367,332.
The start of this gene is located in Contig AP000688.1.1.171703.
Description Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197). Source: Uniprot/SWISSPROT P16152
Transcript ENST00000290349
Transcript
CBR1 (HGNC Symbol ID) (to view all Ensembl genes linked to the name click here)
This transcript is a member of the human CCDS set: CCDS13641
Transcript information Exons: 3 Transcript length: 1,209 bps Protein length: 277
residues
Centre for Bioinformatics(30)
Transcript structure
Protein features
PFAMOxidoreductase
Crystal structure of abad/hsd10 with a bound inhibitor
Short chain dehydrogenaseThis family contains a wide variety of dehydrogenases.FAD/NAD(P)-binding Rossmann fold SuperfamilyThis family is a member of the FAD/NAD(P)-binding Rossmann fold Superfamily clan. This clan includes the following Pfam members: Trp_halogenase; TrkA_N; ThiF; Thi4; THF_DHG_CYH_C; Shikimate_DH; Semialdhyde_dh; SE; Saccharop_dh; RmlD_sub_bind; Pyr_redox_2; Pyr_redox; Polysacc_synt_2; PDH; OCD_Mu_crystall; NmrA; NAD_Gly3P_dh_N; NAD_binding_5; NAD_binding_4; NAD_binding_3; NAD_binding_2; Mur_ligase; Mqo; Mannitol_dh; Malic_M; Lycopene_cycl; Ldh_1_N; KR; IlvN; HI0933_like; Gp_dh_N; GMC_oxred_N; GIDA; GFO_IDH_MocA; GDI; G6PD_N; FMO-like; FAD_binding_3; FAD_binding_2; F420_oxidored; Epimerase; ELFV_dehydrog; DXP_reductoisom; DapB_N; DAO; CoA_binding; ApbA; Amino_oxidase; AlaDh_PNT_C; AdoHcyase_NAD; ADH_zinc_N; adh_short; 3HCDH_N; 3Beta_HSD; 2-Hacid_dh_C; UDPG_MGDP_dh_N;
The short-chain dehydrogenases/reductases family (SDR) PUBMED:7742302 is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was Drosophila alcohol dehydrogenase, this family used to be called PUBMED:2707261, PUBMED:1889416, PUBMED:1740120 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least 2 domains PUBMED:6789320, the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little
Centre for Bioinformatics(31)
sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains PUBMED:6789320.
Input Protein sequence in FASTA format>sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197) - Homo SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
Input multiple Protein sequences in FASTA format>sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197) - Homo
SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
>sp|P48758|DHCA_MOUSE Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) - Mus musculus (Mouse).SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFHQLDIDNPQSIRALRDFLLKEYGGLDVLVNKAGIAFKVNDDTPFHIQAEVTMETNFFGTRDVCKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTKKGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRREDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW
>sp|Q28960|DHCA_PIG Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (20-beta-hydroxysteroid dehydrogenase) (Prostaglandin- E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15- hydroxyprostaglandin dehydr
SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLSPRFHQLDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELTMKTNFMGTRNVCTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETITEEELVGLMNKFVEDTKNGVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNACCPGWVRTDMGGPKAPKSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA
Centre for Bioinformatics(32)
>sp|P47844|DHCA_RABIT Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) - Oryctolagus cuniculus (Rabbit).PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQLDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDVCTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKKGVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNATKSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW
>sp|P47727|DHCA_RAT Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) - Rattus norvegicus (Rat).SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFHQLDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQDVCKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFIEDAKKGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERREDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW
Input nucleotide sequence in FASTA format. >gi|181036|gb|J04056.1|HUMCRE Human carbonyl reductase mRNA, complete cdsCAGACTCGAGCAGTCTCTGGAACACGCTGCGGGGCTCCCGGGCCTGAGCCAGGTCTGTTCTCCACGCAGGTGTTCCGCGCGCCCCGTTCAGCCATGTCGTCCGGCATCCATGTAGCGCTGGTGACTGGAGGCAACAAGGGCATCGGCTTGGCCATCGTGCGCGACCTGTGCCGGCTGTTCTCGGGGGACGTGGTGCTCACGGCGCGGGACGTGACGCGGGGCCAGGCGGCCGTACAGCAGCTGCAGGCGGAGGGCCTGAGCCCGCGCTTCCACCAGCTGGACATCGACGATCTGCAGAGCATCCGCGCCCTGCGCGACTTCCTGCGCAAGGAGTACGGGGGCCTGGACGTGCTGGTCAACAACGCGGGCATCGCCTTCAAGGTTGCTGATCCCACACCCTTTCATATTCAAGCTGAAGTGACGATGAAAACAAATTTCTTTGGTACCCGAGATGTGTGCACAGAATTACTCCCTCTAATAAAACCCCAAGGGAGAGTGGTGAACGTATCTAGCATCATGAGCGTCAGAGCCCTTAAAAGCTGCAGCCCAGAGCTGCAGCAGAAGTTCCGCAGTGAGACCATCACTGAGGAGGAGCTGGTGGGGCTCATGAACAAGTTTGTGGAGGATACAAAGAAGGGAGTGCACCAGAAGGAGGGCTGGCCCAGCAGCGCATACGGGGTGACGAAGATTGGCGTCACCGTTCTGTCCAGGATCCACGCCAGGAAACTGAGTGAGCAGAGGAAAGGGGACAAGATCCTCCTGAATGCCTGCTGCCCAGGGTGGGTGAGAACTGACATGGCGGGACCCAAGGCCACCAAGAGCCCAGAAGAAGGTGCAGAGACCCCTGTGTACTTGGCCCTTTTGCCCCCAGATGCTGAGGGTCCCCATGGACAATTTGTTTCAGAGAAGAGAGTTGAACAGTGGTGAGCTGGGCTCACAGCTCCATCCATGGGCCCCATTTTGTACCTTGTCCTGAGTTGGTCCAAAGGGCATTTACAATGTCATAAATATCCTTATATAAGAAAAAAAATGATCTCTTATCAATTAGCACTCACTAATGTACTACTAATTGAGCAACCTACGCACTCAGTTGACTACGTAAATCTGTCAGGTCTTTTGTGATTTCCTCTGATGCAGGAGAGGAAAAATTGTAATTGATGAAAATAATGAATGAAAATCAACAGATGAATAAATGGTTCTTTATAAGTG
Centre for Bioinformatics(33)
B.TOOLS AND TECHNIQUES1. HOMOLOGY AND SIMILARITY TOOLS1.1. PAIRWISE SEQUENCE ANALYSIS TOOLS
BLASTP
Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.RID: 1149255627-4454-175385043004.BLASTQ1Database: All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding environmental samples 3,658,925 sequences; 1,257,151,091 total lettersTaxonomy reportsQuery= sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase) (EC 1.1.1.189) (Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+]) (EC 1.1.1.197) - Homo Length=276ALIGNMENT
> gi|15215242|gb|AAH12714.1| Carbonyl reductase 1 [Mus musculus]Length=277
Score = 482 bits (1241), Expect = 6e-135 Identities = 242/276 (87%), Positives = 254/276 (92%), Gaps = 0/276 (0%)
Query 1 SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ 60 SS VALVTG NKGIG AI RDLCR FSGDVVL ARD RGQ AVQ+LQAEGLSPRFHQSbjct 2 SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFHQ 61
Query 61 LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV 120 LDID+ QSIRALRDFL KEYGGLDVLVNNAGIAFKV D TPFHIQAEVTMKTNFFGTRDVSbjct 62 LDIDNPQSIRALRDFLLKEYGGLDVLVNNAGIAFKVNDDTPFHIQAEVTMKTNFFGTRDV 121
Centre for Bioinformatics(34)
Query 121 CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK 180 C ELLPLIKPQGRVVNVSS++S+RALK+C ELQQKFRSETITEEELVGLMNKFVEDTKKSbjct 122 CKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTKK 181
Query 181 GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT 240 GVH +EGWP+SAYGVTKIGVTVLSRI ARKL+EQR+GDKILLNACCPGWVRTDMAGPKATSbjct 182 GVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRRGDKILLNACCPGWVRTDMAGPKAT 241
Query 241 KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 KSPEEGAETPVYLALLPPDAEGPHGQFV +K+VE WSbjct 242 KSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW 277
> gi|76779821|gb|AAI05894.1| Carbonyl reductase [Rattus norvegicus]Length=277
Score = 478 bits (1231), Expect = 9e-134 Identities = 238/276 (86%), Positives = 254/276 (92%), Gaps = 0/276 (0%)
Query 1 SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ 60 SS VALVTG NKGIG AIVRDLCR F GDVVLTARD +RG AV+QLQ EGLSPRFHQSbjct 2 SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFHQ 61
Query 61 LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV 120 LDID+ QSIRALRDFL +EYGGL+VLVNNAGIAFKV DPTPFHIQAEVTMKTNFFGT+DVSbjct 62 LDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQDV 121
Query 121 CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK 180 C ELLP+IKPQGRVVNVSS +S+RALKSCSPELQQKFRSETITEEELVGLMNKFVED KKSbjct 122 CKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFVEDAKK 181
Query 181 GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT 240 GVH KEGWP+SAYGVTKIGVTVLSRI+ARKL+E+R+ DKILLNACCPGWVRTDMAGPKATSbjct 182 GVHAKEGWPNSAYGVTKIGVTVLSRIYARKLTEERREDKILLNACCPGWVRTDMAGPKAT 241
Query 241 KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 KSPEEGAETPVYLALLPP AEGPHGQFV +K+VE WSbjct 242 KSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW 277
> gi|1352257|sp|P47844|DHCA_RABIT Carbonyl reductase [NADPH] 1 (NADPH-dependent carbonyl reductase 1) gi|458714|gb|AAA77670.1| NADPH-dependent carbonyl reductaseLength=277
Score = 464 bits (1195), Expect = 1e-129 Identities = 230/271 (84%), Positives = 246/271 (90%), Gaps = 0/271 (0%)
Query 6 VALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDD 65 VALVTG NKG+G AI R LCRLFSGDV+LTA+D +GQAAVQQLQAEGLSPRFHQLDI DSbjct 7 VALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQLDITD 66
Query 66 LQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELL 125 LQSIRALRDFLR+ YGGL+VLVNNA IAFK+ D TPFHIQAEVTMKTNF GTRDVCTELLSbjct 67 LQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDVCTELL 126
Query 126 PLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQK 185 PL++P GRVVNVSS+ +RALKSCSPELQQKFRSETITEEELVGLM KFVEDTKKGVHQ Sbjct 127 PLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKKGVHQT 186
Query 186 EGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEE 245 EGWP +AYGVTK+GVTVLSRI AR LSE R GDKIL+NACCPGWVRTDM GP ATKSPEESbjct 187 EGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNATKSPEE 246
Centre for Bioinformatics(35)
Query 246 GAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 GAETPVYLALLPPDAEGPHGQFV +K+VEQWSbjct 247 GAETPVYLALLPPDAEGPHGQFVMDKKVEQW 277
1.1.2 FASTAFASTA searches a protein or DNA sequence data bank version 3.4t25 Sept 2, 2005Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448Query library @ vs +uniprot librarysearching /ebi/services/idata/v1503/fastadb/uniprot library 1>>>Sequence - 276 aa vs UniProt library1044150180 residues in 3185498 sequences statistics sampled from 60000 to 3176257 sequences Expectation_n fit: rho(ln(x))= 4.9912+/-0.000183; mu= 10.3675+/- 0.010 mean_var=68.0021+/-13.901, 0's: 45 Z-trim: 193 B-trim: 5575 in 2/65 Lambda= 0.155530FASTA (3.47 Mar 2004) function [optimized, BL50 matrix (15:-5)] ktup: 2 join: 36, opt: 24, open/ext: -10/-2, width: 16 Scan time: 17.890The best scores are: >>UNIPROT:Q3SZD7_BOVIN Q3SZD7 Carbonyl reductase 1. (277 aa) initn: 1613 init1: 1613 opt: 1613 Z-score: 1959.4 bits: 370.4 E(): 7.9e-101Smith-Waterman score: 1613; 88.768% identity (95.652% similar) in 276 aa overlap (1-276:2-277)
10 20 30 40 50 Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFH ::. ::::::.:::::..::::::: ::::::::::: .::.::::::::::::: ::UNIPRO MSSSNCVALVTGANKGIGFVIVRDLCRRFSGDVVLTARDEARGRAAVQQLQAEGLSPLFH 10 20 30 40 50 60
60 70 80 90 100 110 Sequen QLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRD :::::: :::::::::::::::::::::::::::::.:: ::::::::::::::::::::UNIPRO QLDIDDRQSIRALRDFLRKEYGGLDVLVNNAGIAFKTADTTPFHIQAEVTMKTNFFGTRD 70 80 90 100 110 120
120 130 140 150 160 170 Sequen VCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTK ::::::::::::::::::::..:: .::.:: ::::::::::::::::::::::::::::UNIPRO VCTELLPLIKPQGRVVNVSSFVSVNSLKKCSRELQQKFRSETITEEELVGLMNKFVEDTK 130 140 150 160 170 180
180 190 200 210 220 230 Sequen KGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKA
Centre for Bioinformatics(36)
.:::.:::::..:::::::::::::::::::::::: ::::::::::::::::::.::::UNIPRO NGVHRKEGWPDTAYGVTKIGVTVLSRIHARKLSEQRGGDKILLNACCPGWVRTDMGGPKA 190 200 210 220 230 240
240 250 260 270 Sequen TKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW .::::::::::::::::: :::::::.:.::::: ::UNIPRO SKSPEEGAETPVYLALLPSDAEGPHGEFISEKRVVQW 250 260 270>>UNIPROT:Q91X28_MOUSE Q91X28 Carbonyl reductase 1. (277 aa) initn: 1597 init1: 1597 opt: 1597 Z-score: 1940.0 bits: 366.8 E(): 9.5e-100Smith-Waterman score: 1597; 87.681% identity (94.565% similar) in 276 aa overlap (1-276:2-277) 10 20 30 40 50 Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFH ::. ::::::.:::::.::.::::: :::::::.::: :::.:::.:::::::::::UNIPRO MSSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFH 10 20 30 40 50 60 60 70 80 90 100 110 Sequen QLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRD :::::. :::::::::: ::::::::::::::::::: : ::::::::::::::::::::UNIPRO QLDIDNPQSIRALRDFLLKEYGGLDVLVNNAGIAFKVNDDTPFHIQAEVTMKTNFFGTRD 70 80 90 100 110 120
120 130 140 150 160 170 Sequen VCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTK :: :::::::::::::::::..:.::::.: ::::::::::::::::::::::::::::UNIPRO VCKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTK 130 140 150 160 170 180
180 190 200 210 220 230 Sequen KGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKA :::: .::::.:::::::::::::::: ::::.:::.:::::::::::::::::::::::UNIPRO KGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRRGDKILLNACCPGWVRTDMAGPKA 190 200 210 220 230 240
240 250 260 270 Sequen TKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW :::::::::::::::::::::::::::::..:.:: :UNIPRO TKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW 250 260 270
>>UNIPROT:Q3KR58_RAT Q3KR58 Carbonyl reductase. (277 aa) initn: 1582 init1: 1582 opt: 1582 Z-score: 1921.8 bits: 363.4 E(): 9.8e-99Smith-Waterman score: 1582; 86.232% identity (94.565% similar) in 276 aa overlap (1-276:2-277)
10 20 30 40 50 Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFH :: ::::::.:::::.:::::::: : ::::::::: .::. ::.:::.::::::::UNIPRO MSSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFH 10 20 30 40 50 60
60 70 80 90 100 110 Sequen QLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRD :::::. :::::::::: .:::::.::::::::::::.::::::::::::::::::::.:UNIPRO QLDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQD 70 80 90 100 110 120
120 130 140 150 160 170 Sequen VCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTK :: ::::.:::::::::::: .:.::::::::::::::::::::::::::::::::::.:UNIPRO VCKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFVEDAK 130 140 150 160 170 180
180 190 200 210 220 230
Centre for Bioinformatics(37)
Sequen KGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKA :::: :::::.::::::::::::::::.::::.:.:. ::::::::::::::::::::::UNIPRO KGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLTEERREDKILLNACCPGWVRTDMAGPKA 190 200 210 220 230 240
240 250 260 270 Sequen TKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW ::::::::::::::::::: :::::::::..:.:: :UNIPRO TKSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW 250 260 270 >>UNIPROT:DHCA_PIG Q28960 Carbonyl reductase [NADPH] 1 ( (288 aa) initn: 1547 init1: 1547 opt: 1547 Z-score: 1879.1 bits: 355.6 E(): 2.3e-96Smith-Waterman score: 1547; 84.420% identity (94.565% similar) in 276 aa overlap (1-276:1-276) 10 20 30 40 50 60Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ ::. .::::::.:::::.:::::::: :.::::::::::.::::::.:::::::::::::UNIPRO SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLSPRFHQ 10 20 30 40 50 60
70 80 90 100 110 120Sequen LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV ::: :::::::: :::::::::::::::::.:::.. .:::::::::.::::::.:::.:UNIPRO LDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELTMKTNFMGTRNV 70 80 90 100 110 120
130 140 150 160 170 180Sequen CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK ::::::::::::::::::: .::::. :::::::::.:::::::::::::::::::::.UNIPRO CTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETITEEELVGLMNKFVEDTKN 130 140 150 160 170 180
190 200 210 220 230 240Sequen GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT :::.:::: .:.::::::::.:::::.:::: ::: ::::::::::::::::::.:::: UNIPRO GVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNACCPGWVRTDMGGPKAP 190 200 210 220 230 240
250 260 270 Sequen KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW :::: :::::::::::: ::::::::::..:.: .: UNIPRO KSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA 250 260 270 280
>>UNIPROT:DHCA_RABIT P47844 Carbonyl reductase [NADPH] 1 (276 aa) initn: 1542 init1: 1542 opt: 1542 Z-score: 1873.3 bits: 354.4 E(): 4.9e-96Smith-Waterman score: 1542; 84.871% identity (94.834% similar) in 271 aa overlap (6-276:6-276)
10 20 30 40 50 60Sequen SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ ::::::.:::.:.::.: :::::::::.:::.: ..:::::::::::::::::::UNIPRO PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQ 10 20 30 40 50 60
70 80 90 100 110 120Sequen LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV ::: :::::::::::::. ::::.:::::: ::::. : ::::::::::::::: :::::UNIPRO LDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDV 70 80 90 100 110 120
130 140 150 160 170 180Sequen CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK
Centre for Bioinformatics(38)
:::::::..: ::::::::. .::::::::::::::::::::::::::::.::::::::UNIPRO CTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKK 130 140 150 160 170 180
190 200 210 220 230 240Sequen GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT :::: ::::..::::::.::::::::.::.:::.: :::::.::::::::::::.::.::UNIPRO GVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNAT 190 200 210 220 230 240
250 260 270 Sequen KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW :::::::::::::::::::::::::::: .:.::::UNIPRO KSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW 250 260 270
1.1.3 EMBOSS ALIGN (NEEDLE)######################################### Program: needle# Rundate: Fri Jun 02 15:08:57 2006# Align_format: srspair# Report_file: /ebi/extserv/old-work/needle-20060602-15085486674353.output########################################
#=======================================## Aligned_sequences: 2# 1: DHCA_HUMAN# 2: DCXR_HUMAN# Matrix: EBLOSUM62# Gap_penalty: 10.0# Extend_penalty: 0.5## Length: 317# Identity: 73/317 (23.0%)# Similarity: 108/317 (34.1%)# Gaps: 114/317 (36.0%)# Score: 119.5# ##=======================================
DHCA_HUMAN 1 SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAV 46 .:|..| ||||..||||...|:.|.. :|..|:. |:|.||.:DCXR_HUMAN 1 MELFLAGRRV-LVTGAGKGIGRGTVQALHA--TGARVVA---VSRTQADL 44
DHCA_HUMAN 47 QQLQAE--GLSPRFHQLDIDDLQSI-RALRDFLRKEYGGLDVLVNNAGIA 93 ..|..| |:.|.. :|:.|.::. ||| ...|.:|:|||||.:|DCXR_HUMAN 45 DSLVRECPGIEPVC--VDLGDWEATERAL-----GSVGPVDLLVNNAAVA 87
DHCA_HUMAN 94 F--------KVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVV 135 . |.|....|.:.....::.:....|.:....:| |.:|DCXR_HUMAN 88 LLQPFLEVTKEAFDRSFEVNLRAVIQVSQIVARGLIARGVP-----GAIV 132
DHCA_HUMAN 136 NVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQK 185 ||||..|.||:.: DCXR_HUMAN 133 NVSSQCSQRAVTN------------------------------------- 145
DHCA_HUMAN 186 EGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMA 235
Centre for Bioinformatics(39)
.|.|..||..:.:|:::.|.:|... ||.:||..|..|.|.|.DCXR_HUMAN 146 ----HSVYCSTKGALDMLTKVMALELGPH----KIRVNAVNPTVVMTSMG 187
DHCA_HUMAN 236 GPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW 276 :||.|....|:|.:... |.|:|...:.|... DCXR_HUMAN 188 --QATWSDPHKAKTMLNRI--------PLGKFAEVEHVVNAILFLLSDRS 227
DHCA_HUMAN 277 276 DCXR_HUMAN 228 GMTTGSTLPVEGGFWAC 244
#---------------------------------------#---------------------------------------
1.1.4 EMBOSS ALIGN (WATER)
######################################### Program: water# Rundate: Fri Jun 02 15:11:06 2006# Align_format: srspair# Report_file: /ebi/extserv/old-work/water-20060602-15110276417491.output########################################
#=======================================## Aligned_sequences: 2# 1: DHCA_HUMAN# 2: DCXR_HUMAN# Matrix: EBLOSUM62# Gap_penalty: 10.0# Extend_penalty: 0.5## Length: 268# Identity: 71/268 (26.5%)# Similarity: 103/268 (38.4%)# Gaps: 75/268 (28.0%)# Score: 130.0# ##=======================================
DHCA_HUMAN 8 LVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAE--GLS 55 ||||..||||...|:.|.. :|..|:. |:|.||.:..|..| |:.DCXR_HUMAN 11 LVTGAGKGIGRGTVQALHA--TGARVVA---VSRTQADLDSLVRECPGIE 55
DHCA_HUMAN 56 PRFHQLDIDDLQSI-RALRDFLRKEYGGLDVLVNNAGIAF--------KV 96 |.. :|:.|.::. ||| ...|.:|:|||||.:|. |.DCXR_HUMAN 56 PVC--VDLGDWEATERAL-----GSVGPVDLLVNNAAVALLQPFLEVTKE 98
DHCA_HUMAN 97 ADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRAL 146 |....|.:.....::.:....|.:....:| |.:|||||..|.||:
Centre for Bioinformatics(40)
DCXR_HUMAN 99 AFDRSFEVNLRAVIQVSQIVARGLIARGVP-----GAIVNVSSQCSQRAV 143
DHCA_HUMAN 147 KSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVT 196 .: .|.|..|DCXR_HUMAN 144 TN-----------------------------------------HSVYCST 152
DHCA_HUMAN 197 KIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEG 246 |..:.:|:::.|.:|... ||.:||..|..|.|.|. :||.|....DCXR_HUMAN 153 KGALDMLTKVMALELGPH----KIRVNAVNPTVVMTSMG--QATWSDPHK 196
DHCA_HUMAN 247 AETPVYLALLPPDAEGPH 264 |:|.:....|...||..|DCXR_HUMAN 197 AKTMLNRIPLGKFAEVEH 214
#---------------------------------------
#---------------------------------------
1.2 MULTIPLE SEQUENCE ANALYSIS TOOLS1.2.1 CLUSTAL W
CLUSTAL W (1.83) multiple sequence alignment
sp|P48758|DHCA_MOUSE SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLSPRFHQ 60sp|P47727|DHCA_RAT SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGLSPRFHQ 60sp|P16152|DHCA_HUMAN SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQ 60sp|Q28960|DHCA_PIG SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLSPRFHQ 60sp|P47844|DHCA_RABIT PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLSPRFHQ 60 .*. ******.***:*:**.* *** * ***:*:*:* :*: **::**:*********
sp|P48758|DHCA_MOUSE LDIDNPQSIRALRDFLLKEYGGLDVLVNKAGIAFKVNDDTPFHIQAEVTMETNFFGTRDV 120sp|P47727|DHCA_RAT LDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEVTMKTNFFGTQDV 120sp|P16152|DHCA_HUMAN LDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDV 120sp|Q28960|DHCA_PIG LDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELTMKTNFMGTRNV 120sp|P47844|DHCA_RABIT LDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVTMKTNFDGTRDV 120 *** : ****** *** : ****:****:* ***:: : ********:**:*** **::*
sp|P48758|DHCA_MOUSE CKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETITEEELVGLMNKFVEDTKK 180sp|P47727|DHCA_RAT CKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETITEEELVGLMNKFIEDAKK 180sp|P16152|DHCA_HUMAN CTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKK 180sp|Q28960|DHCA_PIG CTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETITEEELVGLMNKFVEDTKN 180sp|P47844|DHCA_RABIT CTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETITEEELVGLMKKFVEDTKK 180 *.****:::* ******** :***:.* ******:*************:**:**:*:
sp|P48758|DHCA_MOUSE GVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRREDKILLNACCPGWVRTDMAGPKAT 240sp|P47727|DHCA_RAT GVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERREDKILLNACCPGWVRTDMAGPKAT 240sp|P16152|DHCA_HUMAN GVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKAT 240
Centre for Bioinformatics(41)
sp|Q28960|DHCA_PIG GVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNACCPGWVRTDMGGPKAP 240sp|P47844|DHCA_RABIT GVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNACCPGWVRTDMGGPNAT 240 *** ***..::*****:**:***** **:* *.* ****:************.**:*.
sp|P48758|DHCA_MOUSE KSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW------------ 276sp|P47727|DHCA_RAT KSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW------------ 276sp|P16152|DHCA_HUMAN KSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW------------ 276sp|Q28960|DHCA_PIG KSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA 288sp|P47844|DHCA_RABIT KSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW------------ 276 **** ************..********* :*:* *
1.2.2 T-COFFEESCORE=81* BAD AVG GOOD*sp|P16152|DHCA_ : 82sp|P48758|DHCA_ : 81sp|Q28960|DHCA_ : 81sp|P47844|DHCA_ : 80sp|P47727|DHCA_ : 81
sp|P16152|DHCA_ SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLsp|P48758|DHCA_ SSSRPVALVTGANKGIGFAITRDLCRKFSGDVVLAARDEERGQTAVQKLQAEGLsp|Q28960|DHCA_ SSNTRVALVTGANKGIGFAIVRDLCRQFAGDVVLTARDVARGQAAVKQLQAEGLsp|P47844|DHCA_ PSDRRVALVTGANKGVGFAITRALCRLFSGDVLLTAQDEAQGQAAVQQLQAEGLsp|P47727|DHCA_ SSDRPVALVTGANKGIGFAIVRDLCRKFLGDVVLTARDESRGHEAVKQLQTEGL
Cons .*. ******.***:*:**.* *** * ***:*:*:* :*: **::**:***
sp|P16152|DHCA_ SPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVsp|P48758|DHCA_ SPRFHQLDIDNPQSIRALRDFLLKEYGGLDVLVNKAGIAFKVNDDTPFHIQAEVsp|Q28960|DHCA_ SPRFHQLDIIDLQSIRALCDFLRKEYGGLDVLVNNAAIAFQLDNPTPFHIQAELsp|P47844|DHCA_ SPRFHQLDITDLQSIRALRDFLRRAYGGLNVLVNNAVIAFKMEDTTPFHIQAEVsp|P47727|DHCA_ SPRFHQLDIDNPQSIRALRDFLLQEYGGLNVLVNNAGIAFKVVDPTPFHIQAEV
Cons ********* : ****** *** : ****:****:* ***:: : ********:
sp|P16152|DHCA_ TMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETIsp|P48758|DHCA_ TMETNFFGTRDVCKELLPLIKPQGRVVNVSSMVSLRALKNCRLELQQKFRSETIsp|Q28960|DHCA_ TMKTNFMGTRNVCTELLPLIKPQGRVVNVSSTEGVRALNECSPELQQKFKSETIsp|P47844|DHCA_ TMKTNFDGTRDVCTELLPLMRPGGRVVNVSSMTCLRALKSCSPELQQKFRSETI
Centre for Bioinformatics(42)
sp|P47727|DHCA_ TMKTNFFGTQDVCKELLPIIKPQGRVVNVSSSVSLRALKSCSPELQQKFRSETI
Cons **:*** **::**.****:::* ******** :***:.* ******:****
sp|P16152|DHCA_ TEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKsp|P48758|DHCA_ TEEELVGLMNKFVEDTKKGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRRsp|Q28960|DHCA_ TEEELVGLMNKFVEDTKNGVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAsp|P47844|DHCA_ TEEELVGLMKKFVEDTKKGVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGsp|P47727|DHCA_ TEEELVGLMNKFIEDAKKGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERR
Cons *********:**:**:*:*** ***..::*****:**:***** **:* *.*
sp|P16152|DHCA_ GDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEsp|P48758|DHCA_ EDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVQDsp|Q28960|DHCA_ GDKILLNACCPGWVRTDMGGPKAPKSPEVGAETPVYLALLPSDAEGPHGQFVTDsp|P47844|DHCA_ GDKILVNACCPGWVRTDMGGPNATKSPEEGAETPVYLALLPPDAEGPHGQFVMDsp|P47727|DHCA_ EDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPGAEGPHGQFVQD
Cons ****:************.**:*.**** ************..********* :
sp|P16152|DHCA_ KRVEQW------------sp|P48758|DHCA_ KKVEPW------------sp|Q28960|DHCA_ KKVVEWGVPPESYPWVNAsp|P47844|DHCA_ KKVEQW------------sp|P47727|DHCA_ KKVEPW------------
Cons *:* *
2. FUNCTIONAL ANALYSIS TOOLS2.1. PATTERN SEARCH
2.1.1 SCAN PROSITE
ruler:
hits by patterns: [1 hit (by 1 pattern) on 1 sequence]
Hits by PS00061 ADH_SHORT Short-chain dehydrogenases/reductases family signature :sp-P16152-DHC~ (276 aa)
180 - 208: KgvhqkegwpSsaYGVTKIGVtVLSrIHA
Short-chain dehydrogenases/reductases family signature
Description:
Centre for Bioinformatics(43)
The short-chain dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was Drosophila alcohol dehydrogenase, this family used to be called [2,3,4] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently known to belong to this family are listed below.
Alcohol dehydrogenase (EC 1.1.1.1) from insects such as Drosophila. Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene budC). D-β-hydroxybutyrate dehydrogenase (BDH) (EC 1.1.1.30) from mammals. Acetoacetyl-CoA reductase (EC 1.1.1.36) from various bacterial species (gene phbB or phaB). Glucose 1-dehydrogenase (EC 1.1.1.47) from Bacillus. 3-β-hydroxysteroid dehydrogenase (EC 1.1.1.51) from Comomonas testosteroni. 20-β-hydroxysteroid dehydrogenase (EC 1.1.1.53) from Streptomyces hydrogenans. Ribitol 2-dehydrogenase (EC 1.1.1.56) (RDH) from Klebsiella aerogenes. Estradiol 17-β-dehydrogenase (EC 1.1.1.62) from human. Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene gno). 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100) from Escherichia coli (gene fabG) and from
plants. Retinol dehydrogenase (EC 1.1.1.105) from mammals. 2-deoxy-d-gluconate 3-dehydrogenase (EC 1.1.1.125) from Escherichia coli and Erwinia chrysanthemi
(gene kduD). Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140) from Escherichia coli (gene gutD) and from
Klebsiella pneumoniae (gene sorD). 15-hydroxyprostaglandin dehydrogenase (NAD+) (EC 1.1.1.141) from human. Corticosteroid 11-β-dehydrogenase (EC 1.1.1.146) (11-DH) from mammals. 7-α-hydroxysteroid dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA), Eubacterium strain
VPI 12708 (gene baiA) and from Clostridium sordellii. NADPH-dependent carbonyl reductase (EC 1.1.1.184) from mammals. Tropinone reductase-I (EC 1.1.1.206) and -II (EC 1.1.1.236) from plants. N-acylmannosamine 1-dehydrogenase (EC 1.1.1.233) from Flavobacterium strain 141-8. D-arabinitol 2-dehydrogenase (ribulose forming) (EC 1.1.1.250) from fungi. Tetrahydroxynaphthalene reductase (EC 1.1.1.252) from Magnaporthe grisea. Pteridine reductase 1 (EC 1.5.1.33) (gene PTR1) from Leishmania. 2,5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase (EC 1.1.-.-) from Pseudomonas paucimobilis. Cis-1,2-dihydroxy-3,4-cyclohexadiene-1-carboxylate dehydrogenase (EC 1.3.1. -) from Acinetobacter
calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). Biphenyl-2,3-dihydro-2,3-diol dehydrogenase (EC 1.3.1.-) (gene bphB) from various Pseudomonaceae. Cis-toluene dihydrodiol dehydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). Cis-benzene glycol dehydrogenase (EC 1.3.1.19) from Pseudomonas putida (gene bnzE). 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28) from Escherichia coli (gene entA) and
Bacillus subtilis (gene dhbA). Dihydropteridine reductase (EC 1.5.1.34) (HDHPR) from mammals. Lignin degradation enzyme ligD from Pseudomonas paucimobilis. Agropine synthesis reductase from Agrobacterium plasmids (gene mas1). Versicolorin reductase from Aspergillus parasiticus (gene VER1). Putative keto-acyl reductases from Streptomyces polyketide biosynthesis operons. A trifunctional hydratase-dehydrogenase-epimerase from the peroxisomal β-oxidation system of Candida
tropicalis. This protein contains two tandemly repeated 'short-chain dehydrogenase-type' domain in its N-terminal extremity.
Nodulation protein nodG from species of Azospirillum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acyl chain.
Nitrogen fixation protein fixR from Bradyrhizobium japonicum. Bacillus subtilis protein dltE which is involved in the biosynthesis of D- alanyl-lipoteichoic acid. Human follicular variant translocation protein 1 (FVT1). Mouse adipocyte protein p27. Mouse protein Ke 6. Maize sex determination protein TASSELSEED 2. Sarcophaga peregrina 25 Kd development specific protein.
Centre for Bioinformatics(44)
Drosophila fat body protein P6. A Listeria monocytogenes hypothetical protein encoded in the internalins gene region. Escherichia coli hypothetical protein yciK. Escherichia coli hypothetical protein ydfG. Escherichia coli hypothetical protein yjgI. Escherichia coli hypothetical protein yjgU. Escherichia coli hypothetical protein yohF. Bacillus subtilis hypothetical protein yoxD. Bacillus subtilis hypothetical protein ywfD. Bacillus subtilis hypothetical protein ywfH. Yeast hypothetical protein YIL124w. Yeast hypothetical protein YIR035c. Yeast hypothetical protein YIR036c. Yeast hypothetical protein YKL055c. Fission yeast hypothetical protein SpAC23D3.11.
2.1.2 INTERPRO
Centre for Bioinformatics(45)
2.1.3 BLOCK
Hits
Query=sp|P16152|DHCA_HUMAN Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependeSize=276 Amino AcidsBlocks Searched=27214Alignments Done= 8220276Cutoff combined expected value for hits= 1Cutoff block expected value for repeats/other= 1============================================================================== CombinedFamily Strand Blocks E-valueIPB002347 Glucose/ribitol dehydrogenase famil 1 6 of 6 3e-36IPB002198 Short-chain dehydrogenase/reductase 1 2 of 2 2.3e-09IPB004358 Bacterial sensor protein C-terminal 1 1 of 4 0.01IPB001294 Phytochrome 1 1 of 14 0.14IPB011489 EMI 1 1 of 1 0.45==============================================================================>IPB002347 6/6 blocks Combined E-value= 3e-36: Glucose/ribitol dehydrogenase family signatureBlock Frame Location (aa) Block E-valueIPB002347A 0 6-23 2e-07IPB002347B 0 81-92 1.5e-06IPB002347C 0 126-142 0.0028IPB002347D 0 193-212 0.034IPB002347E 0 218-235 0.00017
Centre for Bioinformatics(46)
IPB002347F 0 236-256 0.074Other reported alignments: |--- 637 amino acids---| IPB002347 A..............B.............C...D..........E...............Fsp|P16152|DHCA_HUM A::B:C::DEF
IPB002347A <->A (-3,1042):5 DHCA_HUMAN|P16152 6 VALVTGGNKGIGLAIVRD ||||||||||||||||||sp|P16152|DHCA_HUM 6 VALVTGGNKGIGLAIVRDIPB002347B A<->B (5,365):57 DHCA_HUMAN|P16152 81 GGLDVLVNNAGI ||||||||||||sp|P16152|DHCA_HUM 81 GGLDVLVNNAGIIPB002347C B<->C (-1,327):33 DHCA_HUMAN|P16152 126 PLIKPQGRVVNVSSIMS |||||||||||||||||sp|P16152|DHCA_HUM 126 PLIKPQGRVVNVSSIMSIPB002347D C<->D (1,90):50 DHCA_HUMAN|P16152 193 YGVTKIGVTVLSRIHARKLS ||||||||||||||||||||sp|P16152|DHCA_HUM 193 YGVTKIGVTVLSRIHARKLSIPB002347E D<->E (-2,260):5 DHCA_HUMAN|P16152 218 DKILLNACCPGWVRTDMA ||||||||||||||||||sp|P16152|DHCA_HUM 218 DKILLNACCPGWVRTDMA
IPB002347F E<->F (-2,381):0 DHCA_HUMAN|P16152 236 GPKATKSPEEGAETPVYLALL |||||||||||||||||||||sp|P16152|DHCA_HUM 236 GPKATKSPEEGAETPVYLALL------------------------------------------------------------------------------>IPB002198 2/2 blocks Combined E-value= 2.3e-09: Short-chain dehydrogenase/reductase SDRBlock Frame Location (aa) Block E-vlueIPB002198A 0 83-92 4.2e-05IPB002198B 0 173-221 0.041Other reported alignments: |--- 2598 amino acids---| IPB002198 A...........................................................Bsp|P16152|DHCA_HUM A:BIPB002198A <->A (-3,7409):82DHCA_HUMAN|P16152 83 LDVLVNNAGI |||||||||| sp|P16152|DHCA_HUM 83 LDVLVNNAGI
IPB002198B A<->B (-2,6175):80 DHCA_HUMAN|P16152 173 KFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKIL |||||||||||||||||||||||||||||||||||||||||||||||||sp|P16152|DHCA_HUM 173 KFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKIL------------------------------------------------------------------------------>IPB004358 1/4 blocks Combined E-value= 0.01: Bacterial sensor protein C-terminal signatureBlock Frame Location (aa) Block E-valueIPB004358C 0 12-30 0.011Other reported alignments: |--- 670 amino acids---| IPB004358 A......................B...............C.....................Dsp|P16152|DHCA_HUM CIPB004358C <->C (17,3694):11 Q9A7L7 381 GGTGLGLAISRDLARLMGG | | |||| ||| || |sp|P16152|DHCA_HUM 12 GNKGIGLAIVRDLCRLFSG------------------------------------------------------------------------------>IPB001294 1/14 blocks Combined E-value= 0.14: PhytochromeBlock Frame Location (aa) Block E-valueIPB001294N 0 14-37 0.15Other reported alignments: |--- 475 amino acids---| IPB001294 A:.BB:.CC:.DD.EE:FF.GG::HHH:III::.JJJ:::KKK:::LLMM::......Nsp|P16152|DHCA_HUM :NIPB001294N <->N (898,1187):13 Q8GV69 1075 EGLGLNICRKLVRLMNGDVQYVRE | || | | | || ||| sp|P16152|DHCA_HUM 14 kGIGLAIvRdLcRLfSGDVvLTaR------------------------------------------------------------------------------
Centre for Bioinformatics(47)
>IPB011489 1/1 blocks Combined E-value= 0.45: EMIBlock Frame Location (aa) Block E-valueIPB011489 0 215-231 0.46Other reported alignments:IPB011489 <->EGFL7_HUMAN|Q9UHF1 78 GLAPARPRYACCPGWKR |||||| |sp|P16152|DHCA_HUM 215 RKgDKiLLNACCPGWvR
5 possible hits reported
2.1.4 SMART
Domains within Homo sapiens protein DHCA_HUMAN (P16152)
Carbonyl reductase [NADPH] 1 (EC 1.1.1.184) (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase)
Centre for Bioinformatics(48)
Mouse over domain / undefined region for more info; click on it to go to detailed annotation; right-click to save whole protein as PNG imageTransmembrane segments as predicted by the TMHMM2 program ( ), coiled coil regions determined by the Coils2 program ( ), segments of low compositional complexity determined by the SEG program ( ). Signal peptides determined by the SignalP program ( ).
2.2 MOTIF ANALYSIS
2.2.1 MEME
MOTIF 1
DKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHG
Centre for Bioinformatics(49)
MOTIF 2
CSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHAKEGWPDSAYGVTKI
MOTIF 3
PRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQ
2.2.2 MAST
gi|118519|sp|P16152|DHCA_HUMAN
Carbonyl reductase [NADPH] 1 (NADPH-dependent carbonyl reductase 1) (Prostaglandin-E(2) 9-reductase)
Centre for Bioinformatics(50)
E S D ?
(Prostaglandin 9-ketoreductase) (15-hydroxyprostaglandin dehydrogenase [NADP+])LENGTH = 277 COMBINED P-VALUE = 1.31e-61 E-VALUE = 2.5e-56DIAGRAM: 218-[1]-9 [1] 5.7e-64 DKILLNA +++++++151 SPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++++++++++++++++++++++++++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
gi|1352256|sp|P48758|DHCA_MOUSE
[1] 5.7e-64 DKILLNA +++++++151 RLELQQKFRSETITEEELVGLMNKFVEDTKKGVHAEEGWPNSAYGVTKIGVTVLSRILARKLNEQRREDKILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++++++++++++++++++++++++++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEPW
gi|75061940|sp|Q5RCU5|DHCA_PONPY
[1] 2.6e-62 DKILLNA + +++++151 SPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDRILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++++++++++++++++++++++++++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
gi|1352258|sp|P47727|DHCA_RAT
[1] 5.3e-62 DKILLNA +++++++151 SPELQQKFRSETITEEELVGLMNKFIEDAKKGVHAKEGWPNSAYGVTKIGVTVLSRIYARKLNEERREDKILLNA CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF ++++++++++++++++++++++++++++++++++ ++++++++226 CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPGAEGPHGQFVQDKKVEPW
gi|1352257|sp|P47844|DHCA_RABIT
Carbonyl reductase [NADPH] 1 (NADPH-dependent carbonyl reductase 1)LENGTH = 277 COMBINED P-VALUE = 1.55e-58 E-VALUE = 3e-53DIAGRAM: 218-[1]-9 [1] 6.8e-61 DKILLNA
Centre for Bioinformatics(51)
E S D ?
E S D ?
E S D ?
E S D ?
+++++++151 SPELQQKFRSETITEEELVGLMKKFVEDTKKGVHQTEGWPDTAYGVTKMGVTVLSRIQARHLSEHRGGDKILVNA
CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF +++++++++++++ +++++++++++++++++++++++++++++226 CCPGWVRTDMGGPNATKSPEEGAETPVYLALLPPDAEGPHGQFVMDKKVEQW
gi|54035740|sp|Q28960|DHCA_PIG
[1] 6.9e-59 DKILLNA +++++++151 SPELQQKFKSETITEEELVGLMNKFVEDTKNGVHRKEGWSDSTYGVTKIGVSVLSRIYARKLREQRAGDKILLNA
CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF ++++++++++++++++++++ ++++++++++++++++++++++226 CCPGWVRTDMGGPKAPKSPEVGAETPVYLALLPSDAEGPHGQFVTDKKVVEWGVPPESYPWVNA
gi|6014959|sp|O75828|DHC3_HUMAN
Carbonyl reductase [NADPH] 3 (NADPH-dependent carbonyl reductase 3)LENGTH = 277 COMBINED P-VALUE = 4.79e-32 E-VALUE = 9.2e-27DIAGRAM: 218-[1]-9 [1] 2.1e-34 DKILLNA + +++++151 SEDLQERFHSETLTEGDLVDLMKKFVEDTKNEVHEREGWPNSPYGVSKLGVTVLSRILARRLDEKRKADRILVNA
CCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQF ++++ + +++ + + +++++++++++++++++ + ++226 CCPGPVKTDMDGKDSIRTVEEGAETPVYLALLPPDATEPQGQLVHDKVVQNW
3. STRUCTURAL ANALYSIS TOOLS
3.1 SECONDARY STRUCTURE
3.1.1 GOR4
10 20 30 40 50 60 70 | | | | | | |SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRccceeeeeeccccccceeeeeeeeeeccccceeeeecchhhhhhhhhhhhhhcccccccccchhhhhhhhALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIhhhhhhhhhcccceeeeccccceeeccccccchhhhhhhhhceeccccccccccccccccccceeeecceMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKeeeeeeccccchhhhhhhhcchhhhhhhhhhhhhhhhcceeeeeeccccceeeeeeeeeeeehhhhhhhhLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQWhhhhhhcceeeeecccccceeecccccccccccccccccceeeeeccccccccccceeeeeeeeecSequence length : 276GOR4 : Alpha helix (Hh) : 78 is 28.26%
Centre for Bioinformatics(52)
E S D ?
E S D ?
310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 81 is 29.35% Beta turn (Tt) : 0 is 0.00% Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 117 is 42.39% Ambigous states (?) : 0 is 0.00% Other states : 0 is 0.00%
3.1.2 SOPMA
10 20 30 40 50 60 70 | | | | | | |SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRcttceeeeeecccttchhhhhhhhhhtttceeeeeccchhhhhhhhhhhhhttcccceeeecccchhhhhALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIhhhhhhhhtttcceeeeetttceeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhccttceeeeeechMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKhhhhhhhtcchhhhhhhhhhccchhhhhhhhhhhhhhhhttccccccccccceehhhheeeehhhhhhhhLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQWhccttttceeeeeeccttceeecccccccccccccccccceeeeeccttccccccceeechhhhhhSequence length : 276SOPMA : Alpha helix (Hh) : 114 is 41.30% 310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 52 is 18.84% Beta turn (Tt) : 28 is 10.14%
Centre for Bioinformatics(53)
Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 82 is 29.71% Ambigous states (?) : 0 is 0.00% Other states : 0 is 0.00%
Parameters : Window width : 17 Similarity threshold : 8 Number of states : 4
3.1.3 PHD
10 20 30 40 50 60 70 | | | | | | |SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRCCCCeEEEEcCChHHHHHHHHHHHHHHCCCeEEEEEcChhHHHHHHHHHHHcCCCceEEEecCCcHHHHHALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIHHHHHHHHHcCCCEEEEEcCCceecCCCCCCceEEEEEEEEeeccchHHHHHHHhHhhCCCCcEEEEEEeMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKhhhhccccCChhHHHhcCCCCCcHHHHHHHHHHHHhhHhccCCccCCCCCCCceeeeeeeeeeehHHHHHLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQWHhhCCCCCeEEEcCCCCCceccCCCCCCCCCCCCCCCCcEEEEEEcCCCCCCCCCCeecCCcccCCSequence length : 276PHD : Alpha helix (Hh) : 89 is 32.25% 310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 65 is 23.55% Beta turn (Tt) : 0 is 0.00%
Centre for Bioinformatics(54)
Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 122 is 44.20% Ambigous states (?) : 0 is 0.00% Other states : 0 is 0.00%
Residues with a scale reliability index of prediction of 5 and over (uppercase letters) are predicted at better than 82%.
3.1.4 NN PREDICT
Tertiary structure class: none
Sequence:SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
Secondary structure prediction (H = helix, E = strand, - = no prediction):----EEEEE-------EEEEHHHHHHH----EEEE---HHH-HHHHHHHH---------------HHHHHHHHHHHHHH----EEEE----HEEE----------H-HH-E----------HHH---------EEEE---HEHEE------HH----------HHHHHHHHHHHHHHHH---------------EEEEEEEEEHHHHH----H-----HHEE---------------------------HHEE-------------E------
Centre for Bioinformatics(55)
3.1.5 JPRED
OrigSeq : SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW : OrigSeq
jalign : -----EEEEE----HHHHHHHHHHHH----EEEEE---HHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH----EEEEE----------HH--HHHHHHHHHHHH--HHHHHHHHHHHH-----EEEEEEE----------HHHHHHHHHHHHHHHHHHHH-------------------HHHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE-------------------H-------------HHHHHHHHHH-------- : jalign
jfreq : -----EEEEE------HHHHHHHHHHH---EEEEEH-HHHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH---EEEEEH--------------HHHHHHHHHHHHHHHHHHHHHHHHHHH----EEEEEEE---------HHHHHHHHHHHHHHHHHHHHH----EEEEE-----------HHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE----------------HHHHHH--H-------HHHHHHHHHHH-------- : jfreq
jhmm : ----EEEEEE----HHHHHHHHHHHHH---EEEE----HHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHH------EEEE--------------HHHHHHHHHHH--HHHHHHHHHHHHH-----
Centre for Bioinformatics(56)
EEEEEEEEEE---------HHHHHHHHHHHHHHHHHH----------------------HHHHHHHHHHHHHHHHHHHHH------EEEEEE-----------------HHHHHHHHHHHH---HHHHHHHHHHH-------- : jhmm
jnet : -----EEEEE----HHHHHHHHHHHHH---EEEEE--HHHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH----EEEEE--------------HHHHHHHHHHHHHHHHHHHHHHHHHHH---EEEEEEEEHHHH-----HHHHHHHHHHHHHHHHHHHHH-----EEE------------HHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE----------------HHHHHHHHHHHH----HHHHHHHHHEE-------- : jnet
jpssm : -----EEEEE----HHHHHHHHHHHHH---EEEE----HHHHHHHHHH----------H-----------------------------------------------HHH-HHHHHHHHHHHHHHHHHH----EEEEE---HHHHHH--------------HHHHHHHHHHH----------------------HHHHHHHHHHHHHHHHHHH-------EEEEEE-----------------HHHHHHHHHHHH---HHHHH---EEE-------- : jpssm
jpred : -----EEEEE----HHHHHHHHHHHHH---EEEEE---HHHHHHHHHHHHH----EEEEE-----HHHHHHHHHHHHHH----EEEEE--------------HHHHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEEEHHH-------HHHHHHHHHHHHHHHHHHHH--------------------HHHHHHHHHHHHHHHHHHHHHHHH-----EEEEEE-----------------HHHHHHHHHHH----HHHHHHHHHHH-------- : jpred
Jnet_25 : B---BBBBBBBBB-BBBBBBBBBBB--BBBBBBBBBB----B--BB--B----B-BBBBBBBBB---BB--BB--BB--BB-BBBBBBBBBBBBBBBBB-BBB-BBBBBBBBBBBBBBBBBBBBB-BB---BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB--BB-BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB-------BBBBBBBBBBBBBBBBB---B--BB--BB-BBBBBBBBB-BBB-BBBBBBB---BB-B : Jnet_25
Jnet_5 : ------BBBBBB---BB--BB--BB---B--BBBB------B--B---B------B-BB-B-B--------B---B--------BBBBBB---------------B--BB-BBB-BBBBBB-BBB--B-----BBBBBBBBBBB-----BBBBBB--BB-BBBBBBB--B------------------------BBBBBB-BBB-BBB--B--------BBBBBBBBB-B-B-------------BB--BB--B-----B--BBB-BB-------- : Jnet_5
Jnet_0 : ------BBBB------B--BB--BB------B---------B--B----------------------------------------------------------------------B---BB--B----------B------------------------------------------------------------B--B----B--BB-------------BBBB---------------------------------------B----------- : Jnet_0
Jnet Rel : 499908999728975799999999886799499755802179999875984776426677112237788888999998334440454022223455667666665811192225589999999998860697089834010111250330457789888912467787655234434222123455677236822478899999999999727189993499983588646656646981349999887886162843555277740347866388 : Jnet RelNotesKey:Colour code for alignment:Blue - Complete identity at a positionShades of red - The more red a position is, the higher the level of conservation of chemical properties of the amino acidsjalign - Jnet alignment predictionjfreq - Jnet PSIBLAST frequency profile predictionjhmm - Jnet hmm profile predictionjnet - Jnet predictionjpssm - Jnet PSIBLAST pssm profile predictionjpred - Consensus prediction over all methods
MCoil - MultiCoil prediction (and dimer and trimer predictions)Lupas - Lupas Coil prediction (window size of 14, 21 and 28)
Note on coiled coil predictions - = less than 50% probability
Centre for Bioinformatics(57)
c = between 50% and 90% probability C = greater than 90% probability
Jnet_25 - Jnet prediction of burial, less than 25% solvent accesibilityJnet_5 - Jnet prediction of burial, less than 5% exposureJnet_0 - Jnet prediction of burial, 0% exposureJnet Rel - Jnet prediction of prediction accuracy, ranges from 0 to 9, bigger is better
3.1.6 PREDICT PROTEIN
PROF results (normal)PROF results (normal)
AA SSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
PROF_sec EEEEEE HHHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH EEEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHHH HHHHHHHHHHHHHHH EEEEEE
Centre for Bioinformatics(58)
HHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHHH EEEEE HHHH HHHHHHHHHH
SUB_sec LLL..EEEE..LL.HHHHHHHHHHHH.LLL.EEEE.LL..HHHHHHHHHH..LLL.EEEE..LLLHHHHHHHHHHHHHH.LL..EEE........LLL..LL......HH......HHHHHHHHHHH...LL.EEEE.......LLL.....HH.HHHHHHHHHHHHHHH...LL.....L.LL..LL........HHHHHHHHHHHHHHHHH...LLL.EEE.........LLLLLLLL........L....LLL..H............LLLLL
O_3_acc bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
P_3_acc e e bbbbbbb bbb bbb bbee e bbbbb e eebeebbe b eeebeb bb bebeeee be bbe b ee eebbbbbbbbbb e e ee e b bb bbb bbb bb bbe b eeb bbbbbbbbb e ee bb bbb bbb bbb bb ebbee ee ee e eeee e eb bb bbb bbb bbb bbee eee bbbbbbbbbbb beb eee eeeeee e b ee eebbebbbbbb eeeeeee
Rel_acc 705136999632021472289339852201397943332436245454835143433312120251616349354945411230999633111210001213132232422022235234934934361331399475411111011011133632286427532173314532212123000135110213112421832265306933253222312307674312012020101102332521022000102223221226304321410335
SUB_acc e.e..bbbbb.....bb..bb..bbe.....bbbb....e.b.ebbeib.e.e.e.........e.e.b.ib.eibiee.....bbbb....................b.......b..bb.ib.e.b.....bbbbbb..............b...bbi.bb...b...be.............e.........b..b...bb..bb...b.........bbbb..................e...................b..b...e....e
Sequence Details 1WMA
Chain A, representative of identical chains Chain A
Description Carbonyl reductase [NADPH] 1
Type polypeptide(L)
Polymer Id 1
Number of residues 276
Domains 1WMAA0: dp domain 1WMAA0
Centre for Bioinformatics(59)
Sequence and Secondary Structure
Key: = extended strand, = turn, = disulfide bond
= alpha helix, = 310 helix, = pi helix, Greyed out residues have no structural information
3.2 PROTEIN VISUALISATION TOOL
3.2.1 RASMOL
Centre for Bioinformatics(60)
Ball and Stick
Cartoon
Centre for Bioinformatics(61)
Strands
Space fill
Centre for Bioinformatics(62)
4. SEQUENCE ANALYSIS4.1 ORF PREDICTION TOOLORF FINDER
94 atgtcgtccggcatccatgtagcgctggtgactggaggcaacaag M S S G I H V A L V T G G N K 139 ggcatcggcttggccatcgtgcgcgacctgtgccggctgttctcg G I G L A I V R D L C R L F S 184 ggggacgtggtgctcacggcgcgggacgtgacgcggggccaggcg G D V V L T A R D V T R G Q A 229 gccgtacagcagctgcaggcggagggcctgagcccgcgcttccac A V Q Q L Q A E G L S P R F H 274 cagctggacatcgacgatctgcagagcatccgcgccctgcgcgac Q L D I D D L Q S I R A L R D 319 ttcctgcgcaaggagtacgggggcctggacgtgctggtcaacaac F L R K E Y G G L D V L V N N 364 gcgggcatcgccttcaaggttgctgatcccacaccctttcatatt A G I A F K V A D P T P F H I 409 caagctgaagtgacgatgaaaacaaatttctttggtacccgagat Q A E V T M K T N F F G T R D 454 gtgtgcacagaattactccctctaataaaaccccaagggagagtg V C T E L L P L I K P Q G R V 499 gtgaacgtatctagcatcatgagcgtcagagcccttaaaagctgc V N V S S I M S V R A L K S C 544 agcccagagctgcagcagaagttccgcagtgagaccatcactgag S P E L Q Q K F R S E T I T E 589 gaggagctggtggggctcatgaacaagtttgtggaggatacaaag E E L V G L M N K F V E D T K 634 aagggagtgcaccagaaggagggctggcccagcagcgcatacggg K G V H Q K E G W P S S A Y G 679 gtgacgaagattggcgtcaccgttctgtccaggatccacgccagg V T K I G V T V L S R I H A R 724 aaactgagtgagcagaggaaaggggacaagatcctcctgaatgcc K L S E Q R K G D K I L L N A 769 tgctgcccagggtgggtgagaactgacatggcgggacccaaggcc C C P G W V R T D M A G P K A 814 accaagagcccagaagaaggtgcagagacccctgtgtacttggcc T K S P E E G A E T P V Y L A 859 cttttgcccccagatgctgagggtccccatggacaatttgtttca L L P P D A E G P H G Q F V S 904 gagaagagagttgaacagtggtga 927 E K R V E Q W *
Centre for Bioinformatics(63)
4.2 SPLICE SITE PREDICTIONNetGene2
The sequence: Sequence has the following composition:
Length: 1209 nucleotides.25.0% A, 25.3% C, 28.9% G, 20.8% T, 0.0% X, 54.2% G+C
Donor splice sites, direct strand--------------------------------- pos 5'->3' phase strand confidence 5' exon intron 3' 780 2 + 0.35 GCTGCCCAGG^GTGGGTGAGA 833 1 + 0.36 CCAGAAGAAG^GTGCAGAGAC 924 2 + 0.80 TTGAACAGTG^GTGAGCTGGG
Donor splice sites, complement strand------------------------------------- pos 3'->5' pos 5'->3' phase strand confidence 5' exon intron 3' 1055 155 2 - 0.41 TAGTACATTA^GTGAGTGCTA 966 244 1 - 0.46 TCAGGACAAG^GTACAAAATG 200 1010 1 - 0.39 GTCCCGCGCC^GTGAGCACCA
Acceptor splice sites, direct strand------------------------------------ pos 5'->3' phase strand confidence 5' intron exon 3' 69 0 + 0.00 CTCCACGCAG^GTGTTCCGCG 115 1 + 0.18 ATCCATGTAG^CGCTGGTGAC 512 2 + 0.19 ACGTATCTAG^CATCATGAGC 521 2 + 0.19 GCATCATGAG^CGTCAGAGCC 527 2 + 0.19 TGAGCGTCAG^AGCCCTTAAA 529 1 + 0.19 AGCGTCAGAG^CCCTTAAAAG 539 2 + 0.19 CCCTTAAAAG^CTGCAGCCCA 545 2 + 0.19 AAAGCTGCAG^CCCAGAGCTG 550 1 + 0.18 TGCAGCCCAG^AGCTGCAGCA 552 0 + 0.18 CAGCCCAGAG^CTGCAGCAGA 558 0 + 0.07 AGAGCTGCAG^CAGAAGTTCC 871 1 + 0.33 TTGCCCCCAG^ATGCTGAGGG
Acceptor splice sites, complement strand---------------------------------------- pos 3'->5' pos 5'->3' phase strand confidence 5' intron exon 3' 771 439 1 - 0.07 CCCTGGGCAG^CAGGCATTCA 768 442 1 - 0.17 TGGGCAGCAG^GCATTCAGGA 760 450 0 - 0.18 AGGCATTCAG^GAGGATCTTG 757 453 0 - 0.19 CATTCAGGAG^GATCTTGTCC 727 483 2 - 0.76 GCTCACTCAG^TTTCCTGGCG 703 507 2 - 0.17 TCCTGGACAG^AACGGTGACG 584 626 1 - 0.33 TCCTCCTCAG^TGATGGTCTC 438 772 1 - 0.07 GGTACCAAAG^AAATTTGTTT 413 797 2 - 0.92 GTCACTTCAG^CTTGAATATG 399 811 1 - 0.34 AATATGAAAG^GGTGTGGGAT 386 824 2 - 0.19 GTGGGATCAG^CAACCTTGAA 375 835 1 - 0.18 AACCTTGAAG^GCGATGCCCG 125 1085 1 - 0.16 TTGCCTCCAG^TCACCAGCGC
------------------------------------------------------------------------------
CUTOFF values used for confidence:
Highly confident donor sites (H): 95.0 %Nearly all true donor sites: 50.0 %
Highly confident acceptor sites (H): 95.0 %Nearly all true acceptor sites: 20.0 %
Centre for Bioinformatics(64)
Graphics showing the prediction output
Direct strand ( + strand)
Complement strand ( - strand)
Centre for Bioinformatics(65)
4.3 GENE FINDERGEN SCAN
GENSCAN 1.0 Date run: 13-Jun-106 Time: 05:17:52
Sequence 05:17:52 : 1209 bp : 54.18% C+G : Isochore 3 (51 - 57 C+G%)
Parameter matrix: HumanIso.smat
Predicted genes/exons:
Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
1.01 Term + 70 927 858 0 0 113 49 1199 0.529 111.73 1.02 PlyA + 1188 1193 6 1.05
Predicted peptide sequence(s):
>05:17:52|GENSCAN_predicted_peptide_1|285_aaVFRAPRSAMSSGIHVALVTGGNKGIGLAIVRDLCRLFSGDVVLTARDVTRGQAAVQQLQAEGLSPRFHQLDIDDLQSIRALRDFLRKEYGGLDVLVNNAGIAFKVADPTPFHIQAEVTMKTNFFGTRDVCTELLPLIKPQGRVVNVSSIMSVRALKSCSPELQQKFRSETITEEELVGLMNKFVEDTKKGVHQKEGWPSSAYGVTKIGVTVLSRIHARKLSEQRKGDKILLNACCPGWVRTDMAGPKATKSPEEGAETPVYLALLPPDAEGPHGQFVSEKRVEQW
Explanation
Gn.Ex : gene number, exon number (for reference)Type : Init = Initial exon (ATG to 5' splice site) Intr = Internal exon (3' splice site to 5' splice site) Term = Terminal exon (3' splice site to stop codon) Sngl = Single-exon gene (ATG to stop) Prom = Promoter (TATA box / initation site) PlyA = poly-A signal (consensus: AATAAA)S : DNA strand (+ = input strand; - = opposite strand)Begin : beginning of exon or signal (numbered on input strand)End : end point of exon or signal (numbered on input strand)Len : length of exon or signal (bp)Fr : reading frame (a forward strand codon ending at x has frame x mod 3)Ph : net phase of exon (exon length modulo 3)I/Ac : initiation signal or 3' splice site score (tenth bit units)Do/T : 5' splice site or termination signal score (tenth bit units)CodRg : coding region score (tenth bit units)P : probability of exon (sum over all parses containing exon)Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)
Comments
The SCORE of a predicted feature (e.g., exon or splice site) is alog-odds measure of the quality of the feature based on local sequenceproperties. For example, a predicted 5' splice site withscore > 100 is strong; 50-100 is moderate; 0-50 is weak; andbelow 0 is poor (more than likely not a real donor site).
The PROBABILITY of a predicted exon is the estimated probability underGENSCAN's model of genomic sequence structure that the exon is correct.This probability depends in general on global as well as local sequenceproperties, e.g., it depends on how well the exon fits with neighboringexons. It has been shown that predicted exons with higher probabilitiesare more likely to be correct than those with lower probabilities
Centre for Bioinformatics(66)
4.4 RESTRICTION MAPPINGNEBcutter
Display: - NEB single cutter restriction enzymes- Main non-overlapping, min. 100 aa
ORFs
GC=54%, AT=46%
Centre for Bioinformatics(67)
DISCUSSIONA. Database retrieval
Expasy Primary accession number is P16152 Protein length 276 aa. Synonyms for Carbonyl
reductase [NADPH1] are NADPH-dependent carbonyl reductase 1,
Prostaglandin-E(2)9-reductase,Prostaglandin9-ketoreductase,15-hydroxyprostaglandin
dehydrogenase[NADP+]
FUNCTION: : Catalyzes the reduction of a wide variety of carbonyl compounds including the
antitumor anthracycline antibiotics. Can convert prostaglandin E2 to prostaglandin F2-alpha.
CATALYTIC ACTIVITY : 1) R-CHOH-R' + NADP+ = R-CO-R' + NADPH. 2) (13E)-(15S)-11-
alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (13E)-11-alpha-hydroxy-9,15-dioxoprost-
13-enoate + NADPH.
3) (5Z,13E)-(15S)-11-alpha,15-dihydroxy-9-oxoprost-13-enoate + NADP+ = (5Z,13E)-11-alpha-
hydroxy-9,15-dioxoprost-13-enoate + NADPH
According to NCBI and EMBL the nucleotide sequence length is 1209 bp. cDNA starts from 94
to 927 bp. Protein ID is J04056.
PDB ID is 1WMA. The structural name according to PDB is “Hydroxy-PP”. Molecular
Description is Carbonyl reductase [NADPH] 1 . Functional Class is Oxidoreductase.
Molecular Function: oxidoreductase activity.
According to ENZYME database EC 1.1.1.184 . R-CHOH-R' + NADP(+) <=> R-CO-R' + NADPH
The approved HGNC symbol for this gene is CBR1
According to Gencard gene location is at Chromosome21. Location: 21q22.13.It starts from
36,364,191 bp to 36,367,332 bp from pter. Size of the gene is 3141 bases. Orientation of the
gene is plus strand.
Gen atlas reported that this enzyme is found in intracellular,cytoplasm,cytosolic.
According to ENSEMBL result this gene can be found on Chromosome 21 at location 36,364,191-
36,367,332.It has only one transcription site. The start of this gene is located in Contig
AP000688.1.1.171703.
Based on Pfam result This family is a member of the FAD/NAD(P)-binding Rossmann fold Superfamily clan.
This clan includes the following Pfam members: Trp_halogenase; TrkA_N. The short-chain
dehydrogenases/reductases family (SDR) is a very large family of enzymes, most of which are
known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to
Centre for Bioinformatics(68)
be characterized was Drosophila alcohol dehydrogenase, this family used to be called 'insect-type',
or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to
300 amino acid residues. Most dehydrogenases possess at least 2 domains,the first binding the
coenzyme, often NAD, and the second binding the substrate. This latter domain determines the
substrate specificity and contains amino acids involved in catalysis. Little sequence similarity
has been found in the coenzyme binding domain although there is a large degree of structural
similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen
through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate
specific domains.
B.Tools and techniques
1.Homology and similarity
In Blast similarity searches similar kinds of sequences are found in Musmusculus, Rat and
Rabbit with similarity of 92%, 92% and 90% respectively. In Fasta similarity searches similar
kinds of sequences are found in Bovine, Rat and Rabbit with similarity of 95.652%, 94.565%
and 94.834% respectively.
Comparison between DHCA_HUMAN and DCXR_HUMAN was carried out by using
EMBOSS tool. The results revealed that similarity in global alignment is 34.1% and local
alignment is 38.4%.
Multiple sequence alignment based on ClustalW revealed that the Alignment Score is 14258 .
The Alignment Score in Tcoffee is 81. “*” represents matches and “.” Represents mismatches.
2. Functional analysis
Scanprosite results revealed that Short-chain dehydrogenases/reductases family signature are
found in 180-208. The short-chain dehydrogenases/reductases family (SDR) [1] is a very large
family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases.
As the first member of this family to be characterized was Drosophila alcohol dehydrogenase,
this family used to be called [2,3,4] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most
member of this family is proteins of about 250 to 300 amino acid residues. The proteins currently
known to belong to this family are listed below.
Alcohol dehydrogenase (EC 1.1.1.1) from insects such as Drosophila.
Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene budC).
D-β-hydroxybutyrate dehydrogenase (BDH) (EC 1.1.1.30) from mammals.
Centre for Bioinformatics(69)
The graphical representation of these pattern has been shown in interpro result.
In block result5 possible hits reported.six blocks are present in Glucose/ribitol
dehydrogenase family signature.
>IPB002347 6/6 blocks Combined E-value= 3e-36: Glucose/ribitol dehydrogenase family
signatureBlock Frame Location (aa) Block E-value
IPB002347A 0 6-23 2e-07
IPB002347B 0 81-92 1.5e-06
IPB002347C 0 126-142 0.0028
IPB002347D 0 193-212 0.034
IPB002347E 0 218-235 0.00017
IPB002347F 0 236-256 0.074
The graphical representation of profiles in smart result confirmed the presence of Carbonyl
reductase [NADPH] 1.
According to meme result three motifs are present in Carbonyl reductase [NADPH] 1. The
motif details with respect to these three motif regions were collcted from meme result and has
been submitted in mast. These motif were found in some other enzymes having same domains.
3. Structural analysis
Secondary structure prediction has been done by using GOR IV, Sopma and PHD. Prediction
based on these tools revealed that % of alpha helix region is in between 28.26% to 41.30% and
% of beta sheet is in between 18.84 to 29.35. This results are confirmed by NNpredict, Jpred and
sopma results.
According to sequence details from PDB database eight beta strands, eleven helix and nine turns
are present in Carbonyl reductase [NADPH] 1.
4. Sequence analysis
Based on ORF prediction the coding region starts from 94 to 927. Netgene2 predicted three
donor splice sites and twelve acceptor splice site in direct strand. It predicted three donor splice
sites and thirteen acceptor splice site in complement strand.
Genscan result revealed the presence of Single-exon gene (ATG to stop) from 94 to 927.polyA
tails stars from 1188-1193.
According to NEB cutter restriction prediction tool “GC” is 54% and “AT” is 46%. Recognition
site for BssHII is found in N terminal. Recognition site for PspOMI1 is found in C terminal.
Centre for Bioinformatics(70)
CONCLUSIONS
Results from different data bases revealed that Carbonyl reductase play a role in which Catalyzes
the reduction of a wide variety of carbonyl compounds including the antitumor anthracycline
antibiotics. Can convert prostaglandin E2 to prostaglandin F2-alpha.Gene location is at
Chromosome 21. It starts from 36,364,191m pter to 36,367,737 bp from pter. Size of the gene is
3141 bases. Orientation of the gene is plus strand. Protein length 276 aa. The approved HGNC
symbol for this gene is CBR1. Functional Class is Oxidoreductase. Reaction catalyzed= R-
CHOH-R' + NADP(+) <=> R-CO-R' + NADPH.
Sequence, Structure and functional analysis of Carbonyl reductase described that similar kinds
of sequences are found in Musmusculus, Rat and Rabbit with similarity of 92%, 92% and 90%
respectively(based on Blast result). In Fasta similarity searches similar kinds of sequences are
found in Bovine, Rat and Rabbit with similarity of 95.652%, 94.565% and 94.834% respectively.
Short-chain dehydrogenases/reductases family signature are found in 180-208. The short-chain
dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which
are known to be NAD- or NADP-dependent oxidoreductases. Eight beta strands, eleven helix
and nine turns are present in Carbonyl reductase [NADPH]1. Genscan result revealed the
presence of Single-exon gene (ATG to stop) from 94 to 927. “GC” is 54% and “AT” is 46%.
Centre for Bioinformatics(71)
REFERENCES
1. Avramopoulos, D.; Cox, T.; Forrest, G. L.; Chakravarti, A.;
Antonarakis, S. E. :
Linkage mapping of the carbonyl reductase (CBR) gene on human
chromosome 21 using a DNA polymorphism in the 3-prime
untranslated region. Genomics 13: 447-448, 1992.
2. Forrest, G. L.; Akman, S.; Krutzik, S.; Paxton, R. J.; Sparkes, R.
S.; Doroshow, J.; Felsted, R. L.; Glover, C. J.; Mohandas, T.;
Bachur, N. R. :
Induction of a human carbonyl reductase gene located on
chromosome 21. Biochim. Biophys. Acta 1048: 149-155, 1990.
3. Lemieux, N.; Malfoy, B.; Forrest, G. L. :
Human carbonyl reductase (CBR) localized to band 21q22.1 by high-
resolution fluorescence in situ hybridization displays gene
dosage effects in trisomy 21 cells. Genomics 15: 169-172, 1993.
4. Watanabe, K.; Sugawara, C.; Ono, A.; Fukuzumi, Y.; Itakura, S.;
Yamazaki, M.; Tashiro, H.; Osoegawa, K.; Soeda, E.; Nomura, T. :
Mapping of a novel human carbonyl reductase, CBR3, and ribosomal
pseudogenes to human chromosome 21q22.2. Genomics 52: 95-100,
1998.
5. Wei, J.; Dlouhy, S. R.; Hara, A.; Ghetti, B.; Hodes, M. E. :
Cloning a cDNA for carbonyl reductase (Cbr) from mouse
cerebellum: murine genes that express Cbr map to chromosomes 16
and 11. Genomics 34: 147-148, 1996.
6. Wermuth, B.; Bohren, K. M.; Heinemann, G.; von Wartburg, J.-P.;
Gabbay, K. H. :
Human carbonyl reductase: nucleotide sequence analysis of a cDNA
and amino acid sequence of the encoded protein. J. Biol. Chem.
263: 16185-16188, 1988.
Centre for Bioinformatics(72)
Centre for Bioinformatics(73)