Evaluating Machine Learning Approaches for Aiding Probe Selection for Gene-Expression Arrays J....

Evaluating Machine Learning Approaches for Aiding Probe Selection for Gene-Expression Arrays

J. Tobler, M. Molla, J. ShavlikUniversity of Wisconsin-Madison M. Molla, E. Nuwaysir, R. GreenNimblegen Systems Inc.

probes

surface

Oligonucleotide Microarrays

Specific probes synthesized atknown spot on chip’s surface

Probes complementary to RNA of genes to be measured

Typical gene (1kb+) MUCH longer than typical probe (24 bases)

Probes: Good vs. Bad

good probe

bad probe

Blue = ProbeRed = Sample

Probe-Picking Method Needed

Hybridization characteristics differ between probes

Probe set represents very small subset of gene

Accurate measurement of expression requires good probe set

Related Work

Use known hybridization characteristics

Lockhardt et al. 1996

Melting point (Tm) predictionsKurata and Suyama 1999

Li and Stormo 2001

Stable secondary structureKurata and Suyama 1999

Our Approach

Apply established machine-learning algorithms Train on categorized examples Test on examples with category hidden

Choose features to represent probes

Categorize probes as good or bad

The FeaturesFeature Name Description

fracA, fracC, fracG, fracT The fraction of A, C, G, or T in the 24-mer

fracAA, fracAC, fracAG, fracAT, fracCA, fracCC, fracCG, fracCT, fracGA, fracGC, fracGG, fracGT,fracTA, fracTC, fracTG, fracTT

The fraction of each of these dimers in the 24-mer

n1, n2, …., n24 The particular nucleotide (A, C, G, or T) at the specified position in the 24-mer

d1, d2, …, d23 The particular dimer (AA, AC,…TT) at the specified position in the 24-mer

The Data

Gene Sequence: GTAGCTAGCATTAGCATGGCCAGTCATG…Complement: CATCGATCGTAATCGTACCGGTCAGTAC…

Probe 1: CATCGATCGTAATCGTACCGGTCA

Probe 2: ATCGATCGTAATCGTACCGGTCAG

Probe 3: TCGATCGTAATCGTACCGGTCAGT

… …

Tilings of 8 genes (from E. coli & B. subtilus) Every possible probe (~10,000 probes) Genes known to be expressed in sample

Our Microarray

Defining our Categories

Normalized Probe Intensity

Low Intensity = BAD Probes

High Intensity = GOOD

Probes (32%)

Mid-Intensity = Not Used in Training Set

Frequenc

0 .05 .15 1.0

The Machine Learning Techniques

Naïve Bayes (Mitchell 1997)

Neural Networks (Rumelhart et al. 1995)

Decision Trees (Quinlan 1996)

Can interpret predictions of each learner probabilistically

Naïve Bayes

Assumes conditional independence between features

Make judgments about test set examples based on conditional probability estimates made on training set

Naïve Bayes

For each example in the test set, evaluate the following:

ilowivalueifeaturePlowP

ihighivalueifeaturePhighP

Neural Network(1-of-n encoding with probe length = 3)

Example probe

sequence: “CAG”

Weights

ACTIVATI

NERROR

Good or Bad…

Decision Tree

fracAC

fracTC

Bad Probe … … …

Good Probe

Automatically builds a tree of rules

…Low

Low High

C G TA

Decision Tree

The information gain of a feature, F, is:

FValuesv

v SEntropyS

SSEntropy

FSnGainInformatio

Information Gain per Feature

CTGA GG

GC TAGT TT

22 2324 1 2 3 4 5 6 789 10

11 1213 1415 16 1718 19 20

21 22232119 2017181614 151311 129 108764 51 2 3

Probe Composition Features

Base Position Features

Base Position

Dimer Position

Cross-Validation

Leave-one-out testing: For each gene (of the 8)

Train on all but this geneTest on this geneRecord resultForget what was learned

Average results across 8 test genes

Typical Probe-Intensity Prediction Across Short Region

650 655 660 665 670 675 680 685 690 695 700

Actual

Starting Nucleotide Position for 24-mer Probe

Typical Probe-Intensity Prediction Across Short Region

650 655 660 665 670 675 680 685 690 695 700

Naïve Bayes Decisio

n Tree

Neural Network

Actual

Starting Nucleotide Position for 24-mer Probe

Probe-Picking Results

0 2 4 6 8 10 12 14 16 18 20

Number of probes selected

Perfect Selector

Probe-Picking Results

0 2 4 6 8 10 12 14 16 18 20

Number of probes selected

Naïve Bayes

Neural Network

Decision Tree

Primer Melting Point

Perfect Selector

Current and Future Directions

Consider more features Folding patterns Melting point

Feature selection

Evaluate specificity along with sensitivity Ie, consider false positives

Evaluate probe selection + gene calling

Try more ML techniques SVMs, ensembles, …

Take-Home Message

Machine learning does a good job on this part of probe-selection problem Easy to collect large number of training

ex’s Easily measured features work well

Intelligent probe selection can increase microarray accuracy and efficiency

Acknowledgements

NimbleGen Systems, Inc. for providing the intensities from the eight tiled genes measured on their maskless array. Darryl Roy for helping in creating the training data. Grants NIH 2 R44 HG02193-02, NLM 1 R01 LM07050-01, NSF IRI-9502990, NIH 2 P30 CA14520-29, and NIH 5 T32 GM08349.

Thanks

Evaluating Machine Learning Approaches for Aiding Probe Selection for Gene-Expression Arrays J....

Documents

Symbolic and Neural Learning Algorithms : An …ftp.cs.wisc.edu/machine-learning/shavlik-group/old...Symbolic and Neural Learning Algorithms : An Experimental Comparison JUDE W. SHAVLIK

S c h ool -b ased p rog rammes t h at seem t o work : U ......the seminal meta-analyses of Tobler and colleagues (Tobler 1986; Tobler & Stratton 1997; Tobler et al. 2000)3. These analyses,

Molla e Adamit

Margaret Shavlik Thesis

Agua Viva ND Lispector Tobler

Shavlik Patch for Microsoft System Center. Agenda 1 Patching, Not a Solved Problem 2 Get More From Microsoft System Center 3 Introducing Shavlik Patch

Shavlik Patch for Microsoft System Center - Satisnet.co.uk · Installing the Shavlik Patch Configuration Manager Add-in ... Welcome to Shavlik Patch for Microsoft System Center,

Motivation for leaders unus molla

1 Unusual Map Projections Waldo Tobler Professor Emeritus Geography department University of California Santa Barbara, CA 93106-4060 tobler

Michael Tobler - Tobler Lab at Kansas State University · CV Michael Tobler 3/24/2019 2 2012 Young Investigator Award of Sigma Xi’s OSU chapter 2010 Presentation Award of the Texas

Passport to success unus molla

Marc molla 6a

Installation and Setup Guide - Shavlik Technologies, LLChelp.shavlik.com/ig-prt-9-1.pdfInstallation and Setup Guide . ... SQL Server Post-Installation Notes ... Shavlik Protect can

SAIFUDDIN SAMSUDDIN MOLLA DIPLOMA IN ELECTRICAL … › upload › report › 104_EDUCATION...Jan 07, 2014 · molla samsuddin molla diploma in electrical engineering 2 mdfc/13-14/6834

Modular Neural Networks for Modeling of a Nonlinear ...ftp.cs.wisc.edu/machine-learning/shavlik-group/eliassi-rad… · Web viewEmail: shavlik@cs.wisc.edu. ... The features anywhereOnPage()

Interpreting Microarray Expression Data Using Text Annotating the Genes Michael Molla, Peter Andreae, Jeremy Glasner, Frederick Blattner, Jude Shavlik

Epiduroscopy - Ahmed El-Molla

Welcome To Training Dan Tobler Sr. Account Executive

February 2016 Shavlik Patch Tuesday Presentation

July 2016 Shavlik Patch Tuesday Presentation