32
Bio- and Bio- and Medical- Medical- Informatics Informatics Presenter: Russell Greiner Presenter: Russell Greiner

Bio- and Medical- Informatics Presenter: Russell Greiner

Embed Size (px)

Citation preview

Page 1: Bio- and Medical- Informatics Presenter: Russell Greiner

Bio- and Bio- and Medical-Medical-

InformaticsInformatics

Presenter: Russell GreinerPresenter: Russell Greiner

Page 2: Bio- and Medical- Informatics Presenter: Russell Greiner

Vision StatementVision Statement

2

Helping the world understand

… and make informed

decisions.* Potential beneficiaries:• biological and medical researchers, • practicing clinicians, and • the people they serve.

bio- and medical- informatics

* data

Page 3: Bio- and Medical- Informatics Presenter: Russell Greiner

3

MotivationMotivationHigh impact on bio-science and societyLocal bioinformatics expertiseML has a key role:

actual patterns (predictors, …) not knownlots of data

Challenging ML problemsdata is high dimensional, noisy, …often structured dataneed to obtain training data, labels, ……

Page 4: Bio- and Medical- Informatics Presenter: Russell Greiner

4

PersonnelPersonnelPI synergy:

R. Greiner, R. Goebel, C. Szepesvari

18 Software developers 4 Postdocs (3 AICML)

14 UGrad / IIP students17 Grad students (11 MSc, 6 PhD)

Page 5: Bio- and Medical- Informatics Presenter: Russell Greiner

5

Partners/CollaboratorsPartners/Collaborators6 UofA CS profs5 UofA BioscientistsNon-UofA collaborators:

Cross Cancer Institute (Alberta Cancer Board)

University of Alberta HospitalBoston University, Maimi University,

Dept of Homeland Security

Page 6: Bio- and Medical- Informatics Presenter: Russell Greiner

6

Additional ResourcesAdditional ResourcesGrants

$440K PENCE (Proteome Analyst)$600K ACB (Brain Tumour)Part of

$3.6M GenomeCanada (Human Metabolome Project) $5.5M GenomeCanada (Alberta Transplant Institute) $1.7M ACB (misc PolyomX grants)

In Kind: Data from CCI, ATI 1970+ MRI scans (260 patients); 270 labeled300 (30K – 50K) Microarray chips80 (250K) SNP Chips

Page 7: Bio- and Medical- Informatics Presenter: Russell Greiner

7

HighlightsHighlightsThe Human Metabolome is

~completed and annotateddescribed in Science, Nature, … Human Metabolome DataBase used by

78,673 Visitors (438,481 pageviews)

Proteome Analyst is world’s best predictor of subcell locationanalyzed >1,000,000 proteins,

for >1,000 usersPatent filed for Brain Tumor SoftwareEffective new approach for learning

to classify MicroarraysVirus classifier obtained 98.5%

accuracy!

Page 8: Bio- and Medical- Informatics Presenter: Russell Greiner

8

30,000

SNP Analysis

Microarray

Proteomics

Metabolomics

Page 9: Bio- and Medical- Informatics Presenter: Russell Greiner

9

Projects and StatusProjects and Status

1. Brain Tumour Analysis (ongoing) (poster # 5)

2. Human Metabolome (new)

3. PolyomX (ongoing) (poster #8)

4. Proteome Analysis (ongoing) (posters # 6,7)

5. Whole Genome Analysis (ongoing)

30,000 Genes30,000 Genes

3000 Enzymes3000 Enzymes

1500 Chemicals

Metabolomics

Proteomics

Genomics

Subcellular Locations

Page 10: Bio- and Medical- Informatics Presenter: Russell Greiner

Brain Tumour Brain Tumour ProjectProject

Technical DetailsTechnical Details

Page 11: Bio- and Medical- Informatics Presenter: Russell Greiner

11

How to Treat Brain Tumours?How to Treat Brain Tumours?

Irradiate ONLY visible tumor No! Must also kill

“(radiographically) occult” cancer cells surrounding tumour !

Irradiate everything within

2 cm margin around tumor

But that … also includes normal cellsstill misses other occult cells

Standard Practice!

Page 12: Bio- and Medical- Informatics Presenter: Russell Greiner

12

How to Treat Brain Tumours?How to Treat Brain Tumours?

BETTER:Predict (from earlier data)

location of occult cellsJust irradiate that region!

Minimize number of normal cells zappedto minimize loss of brain function

Meaningful, as conformal radiotherapy can zap arbitrary shapes!

Page 13: Bio- and Medical- Informatics Presenter: Russell Greiner

13

How to Predict?How to Predict?

Occult cells region where tumour cell will grow next(Assumption)

use prior data (260 patients)Observe each patient over time– how tumours have grown

Predict patterns, based on properties of tumour, patient, region, …

Page 14: Bio- and Medical- Informatics Presenter: Russell Greiner

TechnologyTechnology……

Using Discriminative Random FieldSegmentationGrowth Prediction

Extensions:Increase Accuracy:

Support Vector Random FieldIncrease Computational Efficiency:

Decoupled SVRFExploit Unlabeled Region:

Semi-Supervised (D)SVRF

Page 15: Bio- and Medical- Informatics Presenter: Russell Greiner

15

Brain Tumour: Future WorkBrain Tumour: Future WorkIncorporate other modalities

Diffusion Tensor ImagingPET…

Compute other features:Textures (BGLAM)Using alignment

Improve learning algorithmsUse Active Learning techniques to determine

which regions/slices/studies/patients to labelusing which human labeler

Page 16: Bio- and Medical- Informatics Presenter: Russell Greiner

16

Projects and StatusProjects and Status

1. Brain Tumour Analysis (ongoing) (poster # 5)

2. Human Metabolome (new)

3. PolyomX (ongoing) (poster #8)

4. Proteome Analysis (ongoing) (poster # 6,7)

5. Whole Genome Analysis (ongoing)

30,000 Genes30,000 Genes

3000 Enzymes3000 Enzymes

1500 Chemicals

Metabolomics

Proteomics

Genomics

Subcellular Locations

Page 17: Bio- and Medical- Informatics Presenter: Russell Greiner
Page 18: Bio- and Medical- Informatics Presenter: Russell Greiner

Human Human Metabolome Metabolome ProjectProject

Technical DetailsTechnical Details

Page 19: Bio- and Medical- Informatics Presenter: Russell Greiner

19

HMP OverviewHMP OverviewGoal:

identity & quantify the entire human “metabolome”all small endogamous and exogenous

chemicals that appear in a non-trivial quantity in people…

30,000 Genes30,000 Genes

3200 Enzymes3200 Enzymes

2300 Chemicals

Metabolomics

Proteomics

Genomics

``HMDB: The Human Metabolome Database'‘,Nucleic Acids Research, January 2007.

Page 20: Bio- and Medical- Informatics Presenter: Russell Greiner

20

HMP #1: Fast ProfilingHMP #1: Fast ProfilingGiven an NMR spectrum (blood, urine, CSF),

autonomously find & quantify >100 compounds, in < 2 minutes

If know “NMR signature” of each metabolite… then linear least squaresExcept … “signature” not stable – shifts with unobservable ions

Think EM…ML challenge

Acquire “conditional NMR signature” Active Learning

Page 21: Bio- and Medical- Informatics Presenter: Russell Greiner

21

Cachexia?

Classifier

Cachexia = Yes!

Collect patient urine

Obtain NMR spectrum

Classify Profile

Compute Metabolic Profile

Glucomse

Hippurate

Histidine

Isoleucine

Isopropanol

Lactate Lactose … Leucine

414.2 599.3 2.73 10.44 16.01 40.83 90.3 … 5.6

HMP #2: Classify PatientsHMP #2: Classify Patients

Given: Metabolic profile of patient NMR/Mass spec of

patient’s urine, blood, CSF

Predict: Patient’s disease state

Reaction to Rx; Cachexia; Cancer

The role of ML … Learn Profile Dx classifier

Page 22: Bio- and Medical- Informatics Presenter: Russell Greiner

22

HMP #3: Chemical PropertyHMP #3: Chemical PropertyGiven:

Specific metabolite (chemical)

Predict:Chemical properties of metabolite

Solubility, Melting point, …Biological properties of metabolite

which reactions consume it, …

The role of ML …Learn Metabolite Property classifier

Page 23: Bio- and Medical- Informatics Presenter: Russell Greiner

PolyomX ProjectPolyomX Project

Technical DetailsTechnical Details

Page 24: Bio- and Medical- Informatics Presenter: Russell Greiner

24

PolyomXPolyomXGiven:

Description of a patient (SNP, Microarray, Metabolomic Profile, …)

Predict:Dx: Breast Cancer, Ovarian Cancer, …Rx: Prostate Cancer Toxicity, Cachexia, …

The role of ML …Learn Patient Dx classifier, …

``Predictive Models for Breast Cancer Susceptibility from Multiple, SingleNucleotide Polymorphisms'', Clinical Cancer Research, April 2004.

``Association of DNA Repair and Steroid Metabolism Gene Polymorphisms with Clinical Late Toxicity in Patients Treated with Conformal Radiotherapy for Prostate Cancer'', Clinical Cancer Research, April 2006.

Page 25: Bio- and Medical- Informatics Presenter: Russell Greiner

PolyomX: Future WorkPolyomX: Future Work

Better tools for analyzing microarraysRank-One Bicluster Classifier (RoBiC)

Scaling up to 250K SNP chipsIncorporating >1 modalityMany other tasks:

Ovarian Cancer (microarray)Use pathways to understand

microarrayMicrotubules docking…

Page 26: Bio- and Medical- Informatics Presenter: Russell Greiner

Proteome AnalystProteome Analyst

Technical DetailsTechnical Details

Page 27: Bio- and Medical- Informatics Presenter: Russell Greiner

27

Proteome AnalysisProteome AnalysisGiven:

Protein (FASTA format)

Predict:Properties of ProteinGeneral functionSubcellular localization

The role of ML …Learn Protein Location classifier

Page 28: Bio- and Medical- Informatics Presenter: Russell Greiner

28

Results so farResults so farProteome Analyst classifiers

General Function: 80 – 90%SubCellular Location: ~90%

Best known, by any system! (BioInformatics, 2004)

“Explain” facility has already helped users to identify problems in dataset…

``The Path-A metabolic pathway prediction web server'', Nucleic Acids Research, July 2006.

``PA-GOSUB: A Searchable Database of Model Organism Protein Sequences With Their Predicted GO Molecular Function and Subcellular Localization'', Nucleic Acids Research, Dec 2005.

``Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004

``Proteome Analyst: Custom Predictions with Explanations in a Web-basedTool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004

``Visual Explanation and Auditing of Evidence with Additive Classifiers'‘, IAAI06, July 2006

Page 29: Bio- and Medical- Informatics Presenter: Russell Greiner

29

Current Proteome Analyst Current Proteome Analyst TasksTasksAnalyze metabolic pathways

Incorporate hierarchy (GO)Use other information

Motifs in protein, …Other applications

Relate to Microarray dataUse GLOBAL properties of complete-

proteome … phylogenetic hierarchy…

Page 30: Bio- and Medical- Informatics Presenter: Russell Greiner

Whole Genome Whole Genome AnalysisAnalysis

Technical DetailsTechnical Details

Page 31: Bio- and Medical- Informatics Presenter: Russell Greiner

Whole Genome Analysisheuristic selection of

whole genome substrings, to increase efficiency and accuracy of subtype identification in HIV genome

construct Complete Composition Vector (CCV) nucelotide presentation, as approximate signature of viral genome

100% recognition of subtypes in 867 whole genome examples

Page 32: Bio- and Medical- Informatics Presenter: Russell Greiner

32

Other Bioinformatics TasksOther Bioinformatics TasksPredict Bull’s

Expected Breeding Valuefrom SNPsBovine Haplotype

Predict Tumour Rejectionfrom Microarray

Other challengesfrom colleagues atUniv Hospital,Cross Cancer Inst.