42
Tutorial 8 Clustering 1

Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

  • View
    223

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Tutorial 8

Clustering

1

Page 2: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Clustering• General Methods

– Unsupervised Clustering• Hierarchical clustering• K-means clustering

• Expression data– GEO– UCSC– ArrayExpress

• Tools– EPCLUST– Mev

2

Page 3: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Microarray - Reminder

3

Page 4: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Expression Data Matrix

• Each column represents all the gene expression levels from a single experiment.

• Each row represents the expression of a gene across all experiments.

Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

4

Page 5: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Expression Data Matrix

Each element is a log ratio: log2 (T/R). T - the gene expression level in the testing sample

R - the gene expression level in the reference sample

Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

5

Page 6: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Microarray Data Matrix

Black indicates a log ratio of zero, i.e.

T=~R

Green indicates a negative log ratio,

i.e. T<R

Red indicates a positive log ratio, i.e. T>R

Grey indicates missing data

6

Page 7: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Exp

Log

ratio

Exp

Log

ratio

Microarray Data:Different representations

T<R

T>R

7

Page 8: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

8

A real example

~500 genes3 knockdown conditions

To complicate to analyze without “help”

Page 9: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Microarray Data:Clusters

9

Page 10: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

How to determine the similarity between two genes? (for clustering)

Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

10

Page 11: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Unsupervised Clustering

Hierarchical Clustering

11

Page 12: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram).

16

352 4

16

35 2 4

12

Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes.

Hierarchical Clustering

Page 13: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

13

If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four).

Hierarchical clustering finds an entire hierarchy of clusters.

Page 14: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Hierarchical clustering result

14Five clusters

Page 15: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

An algorithm to classify the data into K number of groups.

15

K=4

K-means Clustering

Page 16: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

How does it work?

16

The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

1 2 3 4

k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

k clusters are created by associating every observation with the nearest mean

The centroid of each of the k clusters becomes the new means.

Steps 2 and 3 are repeated until convergence has been reached.

Page 17: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

17

Different types of clustering – different results

Page 18: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

18

How to search for expression profiles

• GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/

• Human genome browserhttp://genome.ucsc.edu/

• ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/

Page 19: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

19

Page 20: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Datasets - suitable for analysis with GEO tools

Expression profiles by gene

Microarray experiments

Probe sets

Groups of related microarray experiments

20

Searching for expression profiles in the GEO

Page 21: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Download dataset

Clustering

Statistic analysis

21

Page 22: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Clustering analysis

22

Page 23: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Download dataset

Clustering

Statistic analysis

23

Page 24: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

24

The expression distribution for different lines in the cluster

Page 25: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

25

Page 26: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Searching for expression profiles in the Human Genome browser.

26

Page 27: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Keratine 10 is highly expressed

in skin

27

Page 28: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

28

http://www.ebi.ac.uk/arrayexpress/

ArrayExpress

Page 29: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

29

Page 30: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

30

What can we do with all the expression profiles?

Clusters!

How?

EPCLUST

http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

Page 31: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

31

Page 32: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

32

Page 33: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

33

Page 34: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

34

Page 35: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

35

Page 36: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

36

Page 37: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Edit the input matrix: Transpose,Normalize,Randomize 37

Hierarchical clustering

K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 38: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

38

Clusters

Data

Page 39: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Edit the input matrix: Transpose,Normalize,Randomize 39

Hierarchical clustering

K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 40: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Graphical representation of the

cluster

Graphical representation of the

cluster

Samples found in cluster

40

Page 41: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

10 clusters, as requested

41

Page 42: Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

42

http://www.tm4.org/mev/

Multi experiment viewer