Upload
savana-dando
View
227
Download
3
Tags:
Embed Size (px)
Citation preview
Bio- and Bio- and Medical-Medical-
InformaticsInformatics
Presenter: Russell GreinerPresenter: Russell Greiner
Vision StatementVision Statement
2
Helping the world understand
… and make informed
decisions.* Potential beneficiaries:• biological and medical researchers, • practicing clinicians, and • the people they serve.
bio- and medical- informatics
* data
3
MotivationMotivationHigh impact on bio-science and societyLocal bioinformatics expertiseML has a key role:
actual patterns (predictors, …) not knownlots of data
Challenging ML problemsdata is high dimensional, noisy, …often structured dataneed to obtain training data, labels, ……
4
PersonnelPersonnelPI synergy:
R. Greiner, R. Goebel, C. Szepesvari
18 Software developers 4 Postdocs (3 AICML)
14 UGrad / IIP students17 Grad students (11 MSc, 6 PhD)
5
Partners/CollaboratorsPartners/Collaborators6 UofA CS profs5 UofA BioscientistsNon-UofA collaborators:
Cross Cancer Institute (Alberta Cancer Board)
University of Alberta HospitalBoston University, Maimi University,
Dept of Homeland Security
6
Additional ResourcesAdditional ResourcesGrants
$440K PENCE (Proteome Analyst)$600K ACB (Brain Tumour)Part of
$3.6M GenomeCanada (Human Metabolome Project) $5.5M GenomeCanada (Alberta Transplant Institute) $1.7M ACB (misc PolyomX grants)
In Kind: Data from CCI, ATI 1970+ MRI scans (260 patients); 270 labeled300 (30K – 50K) Microarray chips80 (250K) SNP Chips
7
HighlightsHighlightsThe Human Metabolome is
~completed and annotateddescribed in Science, Nature, … Human Metabolome DataBase used by
78,673 Visitors (438,481 pageviews)
Proteome Analyst is world’s best predictor of subcell locationanalyzed >1,000,000 proteins,
for >1,000 usersPatent filed for Brain Tumor SoftwareEffective new approach for learning
to classify MicroarraysVirus classifier obtained 98.5%
accuracy!
8
30,000
SNP Analysis
Microarray
Proteomics
Metabolomics
9
Projects and StatusProjects and Status
1. Brain Tumour Analysis (ongoing) (poster # 5)
2. Human Metabolome (new)
3. PolyomX (ongoing) (poster #8)
4. Proteome Analysis (ongoing) (posters # 6,7)
5. Whole Genome Analysis (ongoing)
30,000 Genes30,000 Genes
3000 Enzymes3000 Enzymes
1500 Chemicals
Metabolomics
Proteomics
Genomics
Subcellular Locations
Brain Tumour Brain Tumour ProjectProject
Technical DetailsTechnical Details
11
How to Treat Brain Tumours?How to Treat Brain Tumours?
Irradiate ONLY visible tumor No! Must also kill
“(radiographically) occult” cancer cells surrounding tumour !
Irradiate everything within
2 cm margin around tumor
But that … also includes normal cellsstill misses other occult cells
Standard Practice!
12
How to Treat Brain Tumours?How to Treat Brain Tumours?
BETTER:Predict (from earlier data)
location of occult cellsJust irradiate that region!
Minimize number of normal cells zappedto minimize loss of brain function
Meaningful, as conformal radiotherapy can zap arbitrary shapes!
13
How to Predict?How to Predict?
Occult cells region where tumour cell will grow next(Assumption)
use prior data (260 patients)Observe each patient over time– how tumours have grown
Predict patterns, based on properties of tumour, patient, region, …
TechnologyTechnology……
Using Discriminative Random FieldSegmentationGrowth Prediction
Extensions:Increase Accuracy:
Support Vector Random FieldIncrease Computational Efficiency:
Decoupled SVRFExploit Unlabeled Region:
Semi-Supervised (D)SVRF
15
Brain Tumour: Future WorkBrain Tumour: Future WorkIncorporate other modalities
Diffusion Tensor ImagingPET…
Compute other features:Textures (BGLAM)Using alignment
Improve learning algorithmsUse Active Learning techniques to determine
which regions/slices/studies/patients to labelusing which human labeler
16
Projects and StatusProjects and Status
1. Brain Tumour Analysis (ongoing) (poster # 5)
2. Human Metabolome (new)
3. PolyomX (ongoing) (poster #8)
4. Proteome Analysis (ongoing) (poster # 6,7)
5. Whole Genome Analysis (ongoing)
30,000 Genes30,000 Genes
3000 Enzymes3000 Enzymes
1500 Chemicals
Metabolomics
Proteomics
Genomics
Subcellular Locations
Human Human Metabolome Metabolome ProjectProject
Technical DetailsTechnical Details
19
HMP OverviewHMP OverviewGoal:
identity & quantify the entire human “metabolome”all small endogamous and exogenous
chemicals that appear in a non-trivial quantity in people…
30,000 Genes30,000 Genes
3200 Enzymes3200 Enzymes
2300 Chemicals
Metabolomics
Proteomics
Genomics
``HMDB: The Human Metabolome Database'‘,Nucleic Acids Research, January 2007.
20
HMP #1: Fast ProfilingHMP #1: Fast ProfilingGiven an NMR spectrum (blood, urine, CSF),
autonomously find & quantify >100 compounds, in < 2 minutes
If know “NMR signature” of each metabolite… then linear least squaresExcept … “signature” not stable – shifts with unobservable ions
Think EM…ML challenge
Acquire “conditional NMR signature” Active Learning
21
Cachexia?
Classifier
Cachexia = Yes!
Collect patient urine
Obtain NMR spectrum
Classify Profile
Compute Metabolic Profile
Glucomse
Hippurate
Histidine
Isoleucine
Isopropanol
Lactate Lactose … Leucine
414.2 599.3 2.73 10.44 16.01 40.83 90.3 … 5.6
HMP #2: Classify PatientsHMP #2: Classify Patients
Given: Metabolic profile of patient NMR/Mass spec of
patient’s urine, blood, CSF
Predict: Patient’s disease state
Reaction to Rx; Cachexia; Cancer
The role of ML … Learn Profile Dx classifier
22
HMP #3: Chemical PropertyHMP #3: Chemical PropertyGiven:
Specific metabolite (chemical)
Predict:Chemical properties of metabolite
Solubility, Melting point, …Biological properties of metabolite
which reactions consume it, …
The role of ML …Learn Metabolite Property classifier
PolyomX ProjectPolyomX Project
Technical DetailsTechnical Details
24
PolyomXPolyomXGiven:
Description of a patient (SNP, Microarray, Metabolomic Profile, …)
Predict:Dx: Breast Cancer, Ovarian Cancer, …Rx: Prostate Cancer Toxicity, Cachexia, …
The role of ML …Learn Patient Dx classifier, …
``Predictive Models for Breast Cancer Susceptibility from Multiple, SingleNucleotide Polymorphisms'', Clinical Cancer Research, April 2004.
``Association of DNA Repair and Steroid Metabolism Gene Polymorphisms with Clinical Late Toxicity in Patients Treated with Conformal Radiotherapy for Prostate Cancer'', Clinical Cancer Research, April 2006.
PolyomX: Future WorkPolyomX: Future Work
Better tools for analyzing microarraysRank-One Bicluster Classifier (RoBiC)
Scaling up to 250K SNP chipsIncorporating >1 modalityMany other tasks:
Ovarian Cancer (microarray)Use pathways to understand
microarrayMicrotubules docking…
Proteome AnalystProteome Analyst
Technical DetailsTechnical Details
27
Proteome AnalysisProteome AnalysisGiven:
Protein (FASTA format)
Predict:Properties of ProteinGeneral functionSubcellular localization
The role of ML …Learn Protein Location classifier
28
Results so farResults so farProteome Analyst classifiers
General Function: 80 – 90%SubCellular Location: ~90%
Best known, by any system! (BioInformatics, 2004)
“Explain” facility has already helped users to identify problems in dataset…
``The Path-A metabolic pathway prediction web server'', Nucleic Acids Research, July 2006.
``PA-GOSUB: A Searchable Database of Model Organism Protein Sequences With Their Predicted GO Molecular Function and Subcellular Localization'', Nucleic Acids Research, Dec 2005.
``Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004
``Proteome Analyst: Custom Predictions with Explanations in a Web-basedTool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004
``Visual Explanation and Auditing of Evidence with Additive Classifiers'‘, IAAI06, July 2006
29
Current Proteome Analyst Current Proteome Analyst TasksTasksAnalyze metabolic pathways
Incorporate hierarchy (GO)Use other information
Motifs in protein, …Other applications
Relate to Microarray dataUse GLOBAL properties of complete-
proteome … phylogenetic hierarchy…
Whole Genome Whole Genome AnalysisAnalysis
Technical DetailsTechnical Details
Whole Genome Analysisheuristic selection of
whole genome substrings, to increase efficiency and accuracy of subtype identification in HIV genome
construct Complete Composition Vector (CCV) nucelotide presentation, as approximate signature of viral genome
100% recognition of subtypes in 867 whole genome examples
32
Other Bioinformatics TasksOther Bioinformatics TasksPredict Bull’s
Expected Breeding Valuefrom SNPsBovine Haplotype
Predict Tumour Rejectionfrom Microarray
Other challengesfrom colleagues atUniv Hospital,Cross Cancer Inst.
…