OUTLINE Microarrays Processing Microarray Data – K- Means Clustering – Hierarchical Clustering...

OUTLINE

Microarrays Processing Microarray Data

– K- Means Clustering– Hierarchical Clustering– SOM

Microarrays

Gene Expression:– We see difference between cels because of

differential gene expression,– Gene is expressed by transcribing DNA intosingle-

stranded mRNA,– mRNA is later translated into a protein,– Microarrays measure the level of mRNA

expression

Microarrays

Gene Expression:– mRNA expression represents dynamic aspects of

cell,– mRNA is isolated and labeled using a fluorescent

material,– mRNA is hybridized to the target; level of

hybridization corresponds to light emission which is measured with a laser

Microarrays

Animation (by A. Malcolm Campbell):

– http://www.bio.davidson.edu/Courses/genomics/chip/chip.html

Microarrays

Sample Application (oncotypeDX):

http://www.oncotypedx.com/en-US/Breast.aspx

Microarrays

Sample Application (oncotypeDX):

– VIDEO

http://www.oncotypedx.com/en-US/Breast.aspx

Processing Microarray Data

Problems:– Extract data from microarrays,– Analyze the meaning of the multiple arrays.

Differentiating gene expression:

– R = G not differentiated

– R > G up-regulated

– R < G down regulated

Problems:– Extract data from microarrays,– Analyze the meaning of the multiple arrays.

Characteristics of microarray data:– Experiment = (gene1, gene2,…, geneN )– Gene = (experiment1, experiment2, …, experimentM)– N is often on the order of 104

– M is often on the order of 101

Microarray data:

Data Analysis:– Clustering:

What genes have similar functions, Subdivide genes or experiments into meaningful classes.

– Classification: Can we correctly classify an unknown experiment or gene

into a known class? FOR EXAMPLE: Can we make better treatment decisions for

a cancer patient based on gene expression profile?

Clustering:– Find classes in the data,– Identify new classes,– Identify gene correlations,– Methods:

K-means clustering, Hierarchical clustering, Self Organizing Maps (SOM)

Distance Measures:– Euclidean Distance:

– Manhattan Distance:

K-means Clustering:– Break the data into K clusters,– Start with random partitioning,– Improve it by iterating.

K-means Clustering:– Select # of clusters, say k,– Repeat

Select k random centroids, {m1, m2,…, mk}, Assign points (genes in this case) to the cluster of closest

centroid by using a distance measure, Compute new centroids, {m1, m2,…, mk},

– until no change to any centroid.

K-means Clustering:

K-means Clustering:– Select # of clusters, say k,

Therea are some methods to determine the optimum k, Assume k is given.

K-means Clustering:– Select k random centroids, {m1, m2,…, mk},

Just randomly assign the centroids.

K-means Clustering:– Assign points (genes in this case) to the cluster of closest

centroid by using a distance measure, Use the following formula to find the closest centroid to the

gene gi:

Then assign gene gi to the closest centroid.

),(min ijj gmd

K-means Clustering:– Compute new centroids, {m1, m2,…, mk},

Find the average in the cluster:

Where:– mc: centroid of the cluster c,– Nc: the number of points in cluster c,– gi: the points in cluster c.

K-means Clustering:– Repeat until no change to any centroid.

Centroids are in the proper places, We can not observe any other improvement in centroids, Therefore STOP.

K-means Clustering DEMO:– Our points:

K-means Clustering DEMO:– Centroids (4 centroids, squares):

K-means Clustering DEMO:– Assign each point to the closest centroid:

K-means Clustering DEMO:– Re evaluate the centroids:

K-means Clustering DEMO:– Iterate until no change.

K-means Clustering DEMO 2:

Hierarchical Clustering:– Similar to costruction of phylogenetic tree,– A distance matrix for all genes are constructed based

on distances between their expression profiles.– Neighbor-joining or UPGMA can be applied on this

matrix to get a hierarchical cluster.– Single linkage, complete linkage, average linkage

clustering

Hierarchical Clustering:– Single linkage: the distance between two clusters is

given by the value of the shortest link between the clusters

Hierarchical Clustering:– Complete linkage: the distance between two clusters

is given by the value of the longest link between the clusters

Hierarchical Clustering:– Average linkage: the distance between two clusters is

defined as the average of distances between all pairs of objects

– like UPGMA

Hierarchical Clustering:– Linkage Criteria:

Agglomerative Hierarchical Clustering:

Agglomerative Hierarchical Clustering DEMO:

Self-Organizing Feature Maps:– by Teuvo Kohonen, – a data visualization technique which helps to

understand high dimensional data by reducing the dimensions of data to a map.

Self-Organizing Feature Maps:– humans simply cannot visualize high dimensional data

as is,– SOM help us understand this high dimensional data.

Self-Organizing Feature Maps:– Based on competitive learning,– SOM helps us by producing a map of usually 1 or 2

dimensions,– SOM plot the similarities of the data by grouping– similar data items together.

Self-Organizing Feature Maps:

Self-Organizing Feature Maps: Input vector, synaptic weight vector

x = [x1, x2, …, xm]T

wj=[wj1, wj2, …, wjm]T, j = 1, 2,3, l

Best matching, winning neuroni(x) = arg min ||x-wj||, j =1,2,3,..,l

Weights wi are updated.

Self-Organizing Feature Maps (EXAMPLE):– Assume that we want to cluster the countries

according to their economic potential,– Countries has N properties (like export - import

amounts, population …)

Self-Organizing Feature Maps (EXAMPLE):– Each country is a point in N dimension,– It means each country is a vetor of size N,– We want to cluster the countries according to the

similarities in economical potential.

Self-Organizing Feature Maps (EXAMPLE):

Self-Organizing Feature Maps:– Similarly SOM is used to analyze microarray data,– Similar genes can be observed easily by this way.

References

M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science

Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.

Barbara Resch, “Hidden Markov Models - A Tutorial for the Course Computational Intelligence”, 2010.

Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000

OUTLINE Microarrays Processing Microarray Data – K- Means Clustering – Hierarchical Clustering...

Documents

Spotted DNA microarrays - Technionbioinfo.cs.technion.ac.il/cobi/PDF/2.spottedMA.pdf · DNA microarray. The process Microarray production Oligonucleotides synthesis 50-70-mer oligonucleotides

Bio-Microarray Fabrication Te c hniques—A Reviewmicrofluidics.utoronto.ca/papers/spotting review.pdfIn addition to DNA and protein microarrays, bio-microarrays also include cell

Development and validation of a diagnostic microbial microarray … · 2017-08-12 · of diagnostic microbial microarrays. A microarray targeting the particulate methane monooxygenase

Integrated microarray solutions at the CPGR in …ww3.tecan.com/platform/content/element/3477/IntegratedMicroarray... · Integrated microarray solutions at the ... Custom microarrays

Gene Expression Microarrays Microarray Normalization Stat 115 2012

Microarrays Lecture Slides Courtesy of Dr. Tim Hughes t.hughes@utoronto.ca Outline: Microarray experiments Different types of microarrays Clustering and

MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays Microarray Data Analysis and Management: Bio-ontologies for Microarrays EMBL-EBI,

Chapter 5: Microarray Techniques - Columbia University · Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data 2 Overview Normalization Clustering. 2 3 Processing Microarray

The Bioinformatics of Microarrays Microarray Outreach Team Fall 2005

Discrimination and clustering with microarray gene expression data

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION … · 2007. 3. 24. · A recent application of microarray technology involves its use in the develop-ment of

1 Part 13 Analysis of Microarrays Technology behind microarrays Data analysis approaches Clustering microarray data

Segmenting Microarrays with Deep Neural Networks · Segmenting Microarrays with Deep Neural Networks Andrew Jones, ... This is the microarray image ... computer vision problems. Neural

Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ

Introduction to MicroArrays and Gene Expression Profiling · Introduction to MicroArrays and Gene Expression Pro ling Haibe-Kains B1;2 Bontempi G2 Sotiriou C1 1Unit e Microarray,

Hierarchical clustering for gene expression data analysishomes.di.unimi.it/valenti/SlideCorsi/MB0910/HierarchicalClustering.pdf · Clustering of Microarray Data 1. Clustering of gene

Vermont Genetics Network Microarray Outreach Program Large Scale Gene Expression with DNA Microarrays

Microarray Technology - MIT OpenCourseWare · Microarray Technology (Thinking carefully about data) Lecture 5 6.874J/7.90J/6.807 David Gifford. Microarrays can access in-vivo data

DNA Microarrays M. Ahmad Chaudhry, Ph. D. Director Microarray Facility University of Vermont

Introduction to Microarray Analysis - uni-mainz.de · Introduction to Microarray Analysis ... multiplex lab-on-chip ... { DNA microarrays Microarray experiment gene expression quanti