Upload
truongcong
View
218
Download
0
Embed Size (px)
Citation preview
1
Databases forMicrobes and Man
Lynda B.M. Ellis
Laboratory Medicine and PathologyUniversity of Minnesota
Encoding Metabolic Logic:Predicting Biodegradation
2
The University of Minnesota Biocatalysis and Biodegradation
Database (UM-BBD)
Freely available on the World Wide Web http://umbbd.ahc.umn.edu/
Focus:Microbial specialized catabolismEmphasizing environmental pollutants
Represents metabolism in the form of metabolic pathways
From 1,2-Dichloroethane to 2-ChloroethanolGraphic of the reaction.
Medline referenceVerschueren KH, Seljee F, Rozeboom HJ, Kalk KH, Dijkstra BW Nature (1993)363(6431): 693-8.
Search Medline titles for haloalkane dehalogenase. 64 citations found on August 14, 2005.
1,2-Dichloroethane|| H2O
haloalkane | /dehalogenase |/
3.8.1.5 | Search GenBank, 89 hits on Aug. 14, 2005Kyoto |\
ExPASy | \| HClv
2-Chloroethanol
Generate a pathway starting from this reaction.
UM-BBD Biotransformation rules in accord with this reaction:Alkyl halide -----> Alcohol (bt0022)
______________________________________________________________________[1,2-Dichloroethane] [BBD Main Menu]
Page Author(s): Renhao Li August 14, 2005 Contact UsThis is the UM-BBD reaction, reacID# r0001.It was generated on August 24, 2005 2:42:28 PM CDT.
© 2005, University of Minnesota.
ReactionPage
4
Hypothesis
~107 compounds are in the environment.The UM-BBD will never contain all such metabolism. UM-BBD data might be used to predict reasonable biodegradation schemes for compounds it does not contain.
Prediction Overview
http://umbbd.ahc.umn.edu/predict/
5
Biotransformation RuleRule bt0003
[Pathway Prediction Engine] [All Rules List] [BBD Main Menu] -------------------------------------------------------------------------------------------------Description:bt0003: Aldehyde -> Carboxylate
UM-BBD Reaction(s):Chloroacetaldehyde -----> Chloroacetate (reacID# r0003)3-Chloroallyl aldehyde -----> trans-3-Chloroacrylic acid (reacID# r0693)3-Chloroallyl aldehyde -----> cis-3-Chloroacrylic acid (reacID# r0692)Dodecanal -----> Lauric acid (reacID# r0605)3-Hydroxybenzaldehyde -----> 3-Hydroxybenzoate (reacID# r0401)3-Methylbenzaldehyde -----> m-Methylbenzoate (reacID# r0214)2-Methylbenzaldehyde -----> o-Methylbenzoate (reacID# r0222)1-Octanal -----> Octanoate (reacID# r0003)Perillyl aldehyde -----> Perillic acid (reacID# r0730)p-Tolualdehyde -----> p-Toluate (reacID #r0177)…
Contact Us if you have any comments on rule bt0003.
Biotransformation Rule-base
bt0001: primary alcohol → aldehydebt0002: secondary alcohol → ketonebt0003: aldehyde → carboxylatebt0005: vic-unsubstituted aromatic → cis-dihydroxydihydroaromaticbt0008: vic-dihydroxybenzenoid → extradiol ring cleavage…bt0255: vic-dihydrodihydroxyaromatic → vic-dihydroxyaromaticbt0259: vic-dihydroxyaromatic → intradiol ring cleavage
6
Pathway Prediction Example
After five steps
Start
Evaluation Can the PPS predict known UM-BBD pathways?
Used 117 UM-BBD compounds containing only C, H, O, N, S, P and/or X initiating pathways of 2 or more reactionswith 629 pathway branches
Present system can duplicate at least one known biodegradation pathway for 115 of these compounds Can predict 503 (80%) of the pathway branchesMost predicted pathways that did not completely duplicate known metabolism were plausible
7
Complexity - 100gm of soil
• 100,000,000,000 bacteria• 100,000 natural chemicals• 10,000 bacterial species• 1 new chemical
Guidance for PPS Users
PPS Aerobic LikelihoodStandard conditions
AerobicSoil (moderate moisture) or water25° CNo other toxic or competing chemicals
Two or more experts prioritize rules as: Very Likely, Likely, Neutral, Unlikely, Very Unlikely, Unknown
8
Aerobic Likelihood Added
Future Work
Rule development 250 rules, and growingNew pathways = new rules
Collaboration with LHASA/METEOR
REACTOR GUI for rule generation
9
People
Top: Philip Judson, Larry Wackett, Yogesh Kale, Dave Roe, Jack RichmanBottom: Lynda Ellis, Carla EssenbergNot shown: John Carlis, John Schrom
Acknowledgements/Reference
US Department of EnergyDOE DE-FG02-01ER63268
LHASA, Ltd. (METEOR)http://www.lhasalimited.org/
Ellis, L.B.M., Roe, D., Wackett, L.P. (2006) “UM-BBD: The First Decade” Nucleic Acids Research, in press.
10
Seeking theVertebrate Secretome
Secreted Proteins
Proteins synthesized within the cell and actively transported out of the cellHave an active role in cell-cell interactions Make up ≈10% of the proteome For two decades, attractive targets for computational identification
11
Signal-Mediated CoTranslational Translocation
Robert J. Huskey, University of Virginia
http://www.people.virginia.edu/~rjh9u/secrprotsyn.html
Signal Peptide Structure
M
12
????Just compute and solve?
Expressed Sequence Tags (ESTs)
Short coding nucleotide sequence fragments from cDNA clonesExpressed sequencesHigh throughput, sequencing coverage mostly 1xEstimated 2% error rateSequence are often incomplete
Often start at the 3’ end, are 5’ truncated 5’ end translates to N-terminus of protein
EST consensus sequences contain multiple, overlapping ESTs, increasing sequence quality
13
N-terminal truncation and SP prediction
Signal peptide prediction programs assume N-terminally-complete sequences
N-terminal transmembrane domains; false positiveProblems with N-terminal truncation
Signal peptides may be truncated; more false negativesInternal transmembrane domains may appear to be at the
N-terminus; more false positives
Signal peptide prediction programs are less accurate when used on EST databases
Objectives
Identify secreted proteins in vertebrate Expressed Sequence Tag databases
Supply targets supporting embryonic growth and development studies in
zebrafishcattleswine
14
Methods
Reference secretome constructionidentified by localization prediction software
Homology modelingcomparison of query EST to reference secretomes
Signal peptide predictionquery sequences and their reference homologs
Alignment analysis
Study-specific filtering
Materials
Reference proteomesObtained from NCBI, IPI, DOE JGI, and PEDANTH. sapiens, Fugu, D. melanogaster
TargetP – subcellular localization predictionPredictions based on N-terminal sequencePredicts secretory pathway signal peptides
15
Vertebrate Secretome DatabaseMySQL relational dBInterfaces with analysis softwareAllows complex queries
www.secretomes.umn.edu
Partial Porcine Secretome
ReferenceSecreted Sets
Homologs ATG aligned signalpeptide
ATG, aligned,signal peptide
H. sapiens IPI 2792 2442 1007 522 294 RefSeq 2379 2077 820 450 228 GenScan 2646 2291 888 462 247T. rubripes 2121 1824 570 333 148Non-redundant 3934 3422 1487 626 352
16
Validation Targets
352 putative secreted porcine sequences with complete N-termini190 had N-terminal MARC clones46 of these chosen for validation
Enriched with sequences of unknown function
Validation Summary
46 sequences selected for CTT validation40 could be translated34 of these (85%) were secreted
17
Future Directions
Improve translation start site identificationAdd reference secretomes - protochordates?Differentiate between secreted receptors and ligandsImprove homology selection criteriaExplore secretion prediction without using signal sequences
AcknowledgementsEric Klee (now Mayo Clinic)Liz Saftalov (2003 BSI intern)Kyong-Jin ShimStephen Ekker
Michael Pickart (now UW-Stout)Michelle Knowlton
Scott FahrenkrugDan Carlson
NIH R01-GM63904, NLM TG-0704l, NIH T32-DE07288-07, NSF/NIBIB EEC-0234112