38
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models Gene Set Enrichment Analysis Chao-Jen Wong Fred Hutchinson Cancer Research Center January 28, 2010 1 / 29 Gene Set Enrichment Analysis

Gene Set Enrichment

Embed Size (px)

Citation preview

Page 1: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Gene Set Enrichment Analysis

Chao-Jen Wong

Fred Hutchinson Cancer Research Center

January 28, 2010

1 / 29

Gene Set Enrichment Analysis

Page 2: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

1 Hypergeometric Testing

2 Simple GSEA using Z-score and Permutation

3 GSEA using Linear Models

2 / 29

Gene Set Enrichment Analysis

Page 3: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Gene set enrichment analysis

Unlike per-gene analysis ...

Search for categories where the constituent genes show changes inexpression level over the experimental conditions.

Use predefined gene set such as KEGG pathways, GO classifications,chromosome bands, and protein complexes.

No need to make a cutoff between genes that are differentiallyexpressed and those that are not.

Provided in the GESABase, Category , GOstats and topGO.

3 / 29

Gene Set Enrichment Analysis

Page 4: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Outline

1 Hypergeometric Testing

2 Simple GSEA using Z-score and Permutation

3 GSEA using Linear Models

4 / 29

Gene Set Enrichment Analysis

Page 5: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Hypergeometric testing

Basic concept: Suppose there are N balls in an urn, n are white andm are black. Drawing k balls out of the urn without replacement,how many black balls do we expect to get? What is the probabilityof getting x black balls?

Hypergeometric testing for under- and over-representation of GOterms.

Inputs

1 Gene universe, N.

2 GO categories (categorize genes by GO terms).

3 A list of interesting genes, I , (differentially expressed genes

identified by limma or just simply t-test by rowttests).

5 / 29

Gene Set Enrichment Analysis

Page 6: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Hypergeometric testing

Interesting (Black) Not (White)In GO term n11 n12 K

Not in GO term n21 n22 N − KI N − I N

Suppose there are j interesting genes in the GO term (n11 = j),compute

1 Probability of seeing j or more black balls in K draws.

2 Expected number of black balls seeing in K draws.

6 / 29

Gene Set Enrichment Analysis

Page 7: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

Define gene universe (a vector of Entrez Gene IDs).

Select a list of interesting genes (a vector of Entrez Gene ID).

Code: gene selection via t-test

> library(genefilter)> library(day2)> library(hgu95av2.db)> data(ALLfilt_bcrneg)> ttests <- rowttests(ALLfilt_bcrneg, "mol.biol")> ## select interesting genes> smPV <- ttests[ttests$p.value < 0.005, ]> selectedEntrezIds <- unlist(mget(rownames(smPV),+ hgu95av2ENTREZID))> entrezUniverse=unlist(mget(featureNames(ALLfilt_bcrneg),+ hgu95av2ENTREZID))

7 / 29

Gene Set Enrichment Analysis

Page 8: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

Define gene universe (a vector of Entrez Gene IDs).

Select a list of interesting genes (a vector of Entrez Gene ID).

Code: gene selection via t-test

> library(genefilter)> library(day2)> library(hgu95av2.db)> data(ALLfilt_bcrneg)> ttests <- rowttests(ALLfilt_bcrneg, "mol.biol")> ## select interesting genes> smPV <- ttests[ttests$p.value < 0.005, ]> selectedEntrezIds <- unlist(mget(rownames(smPV),+ hgu95av2ENTREZID))> entrezUniverse=unlist(mget(featureNames(ALLfilt_bcrneg),+ hgu95av2ENTREZID))

7 / 29

Gene Set Enrichment Analysis

Page 9: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Hypergeometric testing

Create a GOHyperGParams object.

Code: GOHyperGParams

> library(GOstats)> hgCutoff <- 0.001> GOparams <- new("GOHyperGParams",+ geneIds=selectedEntrezIds,+ universeGeneIds=entrezUniverse,+ annotation="hgu95av2.db",+ ontology="BP",+ pvalueCutoff=0.001,+ conditional=TRUE,+ testDirection="over")

8 / 29

Gene Set Enrichment Analysis

Page 10: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Outputs and summary.

Code: hyperGTest

> hgOver <- hyperGTest(GOparams)> class(hgOver)> summary(hgOver)

Exercise: generate report using htmlReort.

> showMethods("htmlReport")> htmlReport(hgOver, file="hgResult.html")> browseURL("hgResult.hrml")

9 / 29

Gene Set Enrichment Analysis

Page 11: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Outputs and summary.

Code: hyperGTest

> hgOver <- hyperGTest(GOparams)> class(hgOver)> summary(hgOver)

Exercise: generate report using htmlReort.

> showMethods("htmlReport")> htmlReport(hgOver, file="hgResult.html")> browseURL("hgResult.hrml")

9 / 29

Gene Set Enrichment Analysis

Page 12: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Lab activity

1 Chapter 14: read and do the exercises in Section 14.3 and 14.4.

2 Use the topGenes dataset (load the data using data(topGenes))and find a subset of genes whose adj.P.Val are less than 0.01.

3 Repeat the conditional Hypergeometric testing to find under- andover-represented biological processes.

4 Generate html reports.

10 / 29

Gene Set Enrichment Analysis

Page 13: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Outline

1 Hypergeometric Testing

2 Simple GSEA using Z-score and Permutation

3 GSEA using Linear Models

11 / 29

Gene Set Enrichment Analysis

Page 14: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Simple GSEA

Consider two group comparison

Start with data quality assessment.

Compute per-gene t-statistics: tk for each gene k.

Null hypothesis: no difference in mean expression

Ho : ZK = 0

ZK =1�|K |

k∈K

tk ∼ N (0, 1),

where K denotes the gene sets, and |K | the number of genes in thegene set.

Alternative approach: use permutation test to assess which genesets have an unusually large absolute value of zK .

12 / 29

Gene Set Enrichment Analysis

Page 15: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

ALLfill bcrneg> library(ALL)> library(hgu95av2.db)> data(ALL)> bcell <- grep("^B", as.character(ALL$BT))> types <- c("NEG", "BCR/ABL")> moltyp <- which(as.character(ALL$mol.biol) %in% types)> # subsetting> ALL_bcrneg <- ALL[, intersect(bcell, moltyp)]> ALL_bcrneg$BT <- factor(ALL_bcrneg$BT)> ALL_bcrneg$mol.biol <- factor(ALL_bcrneg$mol.biol)> # nonspecific filter: remove genes that does not> ## show much variation across samples> library(genefilter)> filt_bcrneg <- nsFilter(ALL_bcrneg,+ var.cutoff=0.5)> ALLfilt_bcrneg <- filt_bcrneg$eset

13 / 29

Gene Set Enrichment Analysis

Page 16: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Data representation: create an incidence matrix Am where aij = 1 ifgene j is in gene set i and aij = 0 otherwise.

> library(KEGG.db)> library(GSEABase)> gsc <- GeneSetCollection(ALLfilt_bcrneg,+ setType=KEGGCollection())> Am <- incidence(gsc)

ExpressionSet object retains only those features that are in theincidence matrix Am.

> nsF <- ALLfilt_bcrneg[colnames(Am), ]

14 / 29

Gene Set Enrichment Analysis

Page 17: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Exercise

1 How many gene sets and how many genes are represented by theincidence matrix Am?

2 How many gene sets have fewer than ten genes in them?

3 What is the largest number of gene sets in which a gene can befound?

4 What is the name of this gene set? (use KEGGPATHID2NAME)

Code> dim(nsF)> dim(Am)> nGene <- rowSums(Am)> rownames(Am)[nGene < 10]> sort(nGene, decreasing=TRUE)[1]> KEGGPATHID2NAME[["05200"]]

15 / 29

Gene Set Enrichment Analysis

Page 18: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Exercise

1 How many gene sets and how many genes are represented by theincidence matrix Am?

2 How many gene sets have fewer than ten genes in them?

3 What is the largest number of gene sets in which a gene can befound?

4 What is the name of this gene set? (use KEGGPATHID2NAME)

Code> dim(nsF)> dim(Am)> nGene <- rowSums(Am)> rownames(Am)[nGene < 10]> sort(nGene, decreasing=TRUE)[1]> KEGGPATHID2NAME[["05200"]]

15 / 29

Gene Set Enrichment Analysis

Page 19: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Compute the per-gene test statistics using the rowttests function.

> rtt <- rowttests(nsF, "mol.biol")> names(rtt)

[1] "statistic" "dm" "p.value"

> rttStats <- rtt$statistic

Reduce the incidence matrix by removing all gene sets that havefewer than ten genes in them.

> selectedRows <- (rowSums(Am) > 10)> Am2 <- Am[selectedRows, ]

Compute zk for each pathway: zK = 1√|K |

�k∈K tk .

> tA <- as.vector(Am2 %*% rttStats)> tAadj <- tA /sqrt(rowSums(Am2))> names(tAadj) <- rownames(Am2)

16 / 29

Gene Set Enrichment Analysis

Page 20: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Compute the per-gene test statistics using the rowttests function.

> rtt <- rowttests(nsF, "mol.biol")> names(rtt)

[1] "statistic" "dm" "p.value"

> rttStats <- rtt$statistic

Reduce the incidence matrix by removing all gene sets that havefewer than ten genes in them.

> selectedRows <- (rowSums(Am) > 10)> Am2 <- Am[selectedRows, ]

Compute zk for each pathway: zK = 1√|K |

�k∈K tk .

> tA <- as.vector(Am2 %*% rttStats)> tAadj <- tA /sqrt(rowSums(Am2))> names(tAadj) <- rownames(Am2)

16 / 29

Gene Set Enrichment Analysis

Page 21: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Compute the per-gene test statistics using the rowttests function.

> rtt <- rowttests(nsF, "mol.biol")> names(rtt)

[1] "statistic" "dm" "p.value"

> rttStats <- rtt$statistic

Reduce the incidence matrix by removing all gene sets that havefewer than ten genes in them.

> selectedRows <- (rowSums(Am) > 10)> Am2 <- Am[selectedRows, ]

Compute zk for each pathway: zK = 1√|K |

�k∈K tk .

> tA <- as.vector(Am2 %*% rttStats)> tAadj <- tA /sqrt(rowSums(Am2))> names(tAadj) <- rownames(Am2)

16 / 29

Gene Set Enrichment Analysis

Page 22: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Exercise

1 Which pathways have remarkably low (< 5) and high aggregatestatistics (> 5)?

2 What is the name the pathway that has the lowest zk score?

3 Use KEGG2heatmap to plot a heatmap for the genes in thispathway.

Code> smPW <- tAadj[tAadj < -5]> mget(names(smPW),KEGGPATHID2NAME)> lgPW <- tAadj[tAadj > 5]> mget(names(lgPW), KEGGPATHID2NAME)

17 / 29

Gene Set Enrichment Analysis

Page 23: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Using KEGG

Exercise

1 Which pathways have remarkably low (< 5) and high aggregatestatistics (> 5)?

2 What is the name the pathway that has the lowest zk score?

3 Use KEGG2heatmap to plot a heatmap for the genes in thispathway.

Code> smPW <- tAadj[tAadj < -5]> mget(names(smPW),KEGGPATHID2NAME)> lgPW <- tAadj[tAadj > 5]> mget(names(lgPW), KEGGPATHID2NAME)

17 / 29

Gene Set Enrichment Analysis

Page 24: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

KEGG2heatmap

> KEGG2heatmap("03010", nsF, "hgu95av2")

11005

28036

65005

64001

08012

28006

28042

28001

26003

28031

22013

12026

22010

28005

24022

68003

28021

27003

12012

43004

09017

24018

22009

28007

24001

24008

27004

03002

28024

04010

33005

24011

04016

62001

57001

64002

20002

30001

16009

49006

24010

36002

25006

28019

01010

28037

08001

06002

12019

43007

68001

28035

24017

04008

28023

48001

09008

15005

25003

26001

37013

14016

28043

28044

62002

62003

12007

31011

08011

28047

04007

22011

84004

01005

43001

08024

12006

15001

43012

31948_at33661_at41214_at34316_at32394_s_at1151_at39856_at39916_r_at41765_at31511_at34570_at31907_at33485_at36786_at33668_at31546_at31955_at32330_at34085_at31956_f_at33619_at32440_at36333_at35125_at31509_at32315_at33674_at31505_at34592_at32437_at36358_at33117_r_at1653_at

18 / 29

Gene Set Enrichment Analysis

Page 25: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Permutation testing

Assess the significant gene sets with respect to a referencedistribution build by a number of permutations.

gseattperm: permute the sample labels.

Return p-value w.r.t. to a reference distribution:Lower: proportion of permutation t-statistics that were smaller than

the observed t-statisticsUpper: proportion of permutation t-statistics that were larger than

the observed t-statistics

Code: using gseattperm> library(Category)> set.seed(123)> pvals <- gseattperm(nsF, nsF$mol.biol, Am2, 1000)> pvalCut <- 0.05> lowC <- rownames(pvals)[pvals[, 1] <= pvalCut]> unlist(getPathNames(lowC), use.names=FALSE)

[1] "Glycerophospholipid metabolism"

[2] "Ribosome"

[3] "RNA polymerase"

[4] "DNA replication"

[5] "Mismatch repair"

[6] "Homologous recombination"

19 / 29

Gene Set Enrichment Analysis

Page 26: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Outline

1 Hypergeometric Testing

2 Simple GSEA using Z-score and Permutation

3 GSEA using Linear Models

20 / 29

Gene Set Enrichment Analysis

Page 27: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Chromosome bands

Use the mapping of genes to chromosome bands.

To answer whether there are anomalies in the pattern of geneexpression that related to chromosome bands.

Use GSEA linear models.

Human chromosome 12

p13.33

p13.32

p13.31

p13.2

p13.1

p12.3

p12.2

p12.1

p11.23

p11.22

p11.21

p11.1

q11

q12

q13.11

q13.12

q13.13

q13.2

q13.3

q14.1

q14.2

q14.3

q15

q21.1

q21.2

q21.31

q21.32

q21.33 q22

q23.1

q23.2

q23.3

q24.11

q24.12

q24.13

q24.21

q24.22

q24.23

q24.31

q24.32

q24.33

Figure: Ideogram for human chromosome 12. The shaded bands togetherrepresent 12q21. Notice that the chromosome bands are hierarchicallynested, and they almost form a partition. (D. Sarker et. al. 2007)

Reference

”Using Categories defined by Chromosome Bands”by D. Sarker et. al.

21 / 29

Gene Set Enrichment Analysis

Page 28: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

Consider the comparison of BCR/ABL and NEG groups.

Use ALL_bcrneg object.

Use nsFilter to remove probes with no Entrez Gene ID and nomapping to a chromosome band. Ensure that each Entrez Gene IDmaps to exactly one probeset which has the highest IQR. Alsoremove probes with lack of variation (var < 0.5).

Code: nonspecific filtering

> ALLfilt <- nsFilter(ALL_bcrneg, require.entez=TRUE,+ remove.dupEntrez=TRUE,+ require.CytoBand=TRUE,+ var.func=IQR,+ var.cutoff=0.5)$eset

22 / 29

Gene Set Enrichment Analysis

Page 29: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

Consider the comparison of BCR/ABL and NEG groups.

Use ALL_bcrneg object.

Use nsFilter to remove probes with no Entrez Gene ID and nomapping to a chromosome band. Ensure that each Entrez Gene IDmaps to exactly one probeset which has the highest IQR. Alsoremove probes with lack of variation (var < 0.5).

Code: nonspecific filtering

> ALLfilt <- nsFilter(ALL_bcrneg, require.entez=TRUE,+ remove.dupEntrez=TRUE,+ require.CytoBand=TRUE,+ var.func=IQR,+ var.cutoff=0.5)$eset

22 / 29

Gene Set Enrichment Analysis

Page 30: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

Compute per-gene t-statistics using limma.

Code: moderate t-statistics> library(limma)> design <- model.matrix(~0 + ALLfilt$mol.biol)> colnames(design) <- c("BCR/ABL", "NEG")> contr <- c(1, -1)> fit1 <- lmFit(ALLfilt, design)> fit2 <- contrasts.fit(fit1, contr)> fit3 <- eBayes(fit2)> tlimma <- topTable(fit3, number=nrow(fit3),+ adjust.method="none")> ## annotation> entrezUniverse <- unlist(mget(tlimma$ID,+ hgu95av2ENTREZID))> tstats <- tlimma$t> names(tstats) <- entrezUniverse

23 / 29

Gene Set Enrichment Analysis

Page 31: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Data preparation

Compute per-gene t-statistics using limma.

Code: moderate t-statistics> library(limma)> design <- model.matrix(~0 + ALLfilt$mol.biol)> colnames(design) <- c("BCR/ABL", "NEG")> contr <- c(1, -1)> fit1 <- lmFit(ALLfilt, design)> fit2 <- contrasts.fit(fit1, contr)> fit3 <- eBayes(fit2)> tlimma <- topTable(fit3, number=nrow(fit3),+ adjust.method="none")> ## annotation> entrezUniverse <- unlist(mget(tlimma$ID,+ hgu95av2ENTREZID))> tstats <- tlimma$t> names(tstats) <- entrezUniverse

23 / 29

Gene Set Enrichment Analysis

Page 32: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Linear models

Per−gene t−statistics

12q21.1

12q21

12q2

12q

All

−5 0 5

!! !

!!!

!!! !!

!! !

!!! ! !! !

! !! !!! !

! !! !! !!!!! !! ! !

! ! ! !!!

!!!!! !

!!! !!! !

!!!! !

!!!!!

!! !! !! !!!

! !!!

!!!

!! !!!

!!

!! !!!!! !

!!! ! ! !!

! !!

!!

! !!

!!! ! !! !! ! !!! !!

!!

! !!! !

!!!! !

! !! !!!! ! !

! !!!! !! !

!!!!!! !! ! !!

!!! ! !!

! !!! !

!! !!! !!

!!

! !!!! ! !! !

! !! !!!!! !

!!! !!!! ! !! ! !

!! ! !!!

!!!

!! !!! ! !!! !!! !

! !! !!!!

!!!

!!! !

!!! !!

! ! !!!! ! !! !! !! !

! !!! !!

!! !

!!!

!!

!!!!!!

! !! !

!!

!!!!

!!!

! !!!

!! !

!! ! !

!!

! ! !!!

!! !

!! !!

!!!!!!! !

!!

!!! !! !!! ! !

!! !! !! !!! ! !!!

!!!

!! !

! !!!! !!! !

! !!!!! ! !!! !

!!!

!! ! !! !! !!

!! ! ! !! !

! !!

! !!

!!! !!! !

! !!! !! !! !!! ! !!! !!!!

!!!

!!

!!!! ! !! !!

! !!! !!

!!!!

! !! !!

!!!! ! !!! !

!! !!! !! !

! ! !! !! !!!!! !

!!!

!!!

!!!

!!

!!! !!

!!!! !! !

! !!!!

!! !! ! !! !

!! !! !!!

! ! !! !!! !

!! !!! !! !! !!!! !!!! !!

! ! ! !!!

!! !!!!! !! ! !! ! !!

!! !!! !!!!

!! ! !! !!

!! ! ! !!!

!!!

!!!!! !!!! !!!

! ! !!! !!!! !!! !! !

! !!!

!!!!

! !! !

!!!

!!! !! ! !! !

!! !!! !

!!

!! !!!!!!

!!

!!!!!! !!! !

!! ! ! ! !

!!

! !!

!!!!! !!

!!!!!

!!!

!!!

!!!! !!

!! !!

!!!

!!

! !! !!! ! !! !!

! !! ! !

!!!

! !! !!!! !!! !! !

!! !!

!! !!

!!

! !!!! !! ! !! !!! !! !

! !!! ! !!! !!

!!

! !! !!!

!!

!!! ! !!!!

!!

!!!! !!

! !!!!!!

! !! !! !! ! !!!! ! !

! !!! !! !! ! !!!!!! !!

! !! ! !!!!

!! !!!!! !!

!!!

!!

!!! !

!!!!

!! ! !!

!! !!!! !

! !! !!

! ! !! !!

!!! !! ! ! !!! !! !! !

!!!

!! ! !!!

! !!!! !! !!!

!!

! !!!

!! !! ! !!! !!!

!!

!!!

!!

!!!

!! !! ! !!! !!

!!! ! !! !

!!

!! !!

!!! !!

!! !!!!

!! !!!! ! !!!!

! !!

! ! !! ! !!! !!! !! !! !!! !!!!!

!!! !! ! ! !

!!!

!!

!! ! ! !!!!!!

!!!

! !!

!!! ! !! ! !!! !! !

! !! !! ! !!

! !!

!!

!! !! !! !!!! !

!!

!!!

!!! !

!!!

!! !!! !

!!! ! !

!!!! ! !

!!

!!!!

!!!!! !!! ! !

!! !!!! ! ! !!!! ! !!!

! !! !! !! !! ! ! !!! !

!!! !! !!!! !! !!

!! !! !!!! !! !

!!

!!

! !!!

!!

!! !! !

!! !!

! ! !! !!

! !!

! !!! ! !

!!! ! !!!!! !

!! !!!

!! ! !!

!!! ! !! !

!!

!!!! !

! !!!!

!!!

! ! !!!

!! !!

!! ! !!! ! ! !

!!! !!! ! !! !!

!!! !!!

!! !!!!

!! ! !! !! !!!

!! !! ! !

!!! !

! !!!

!! !!! !

! !!!! !

!! !

!!! ! !! !

! !!! ! !

!!!!! ! !!!

! !!

!!

!!!!

!! !!!!! !!!! !

!! !

!! !!

!!!!!! !!!!

!!

! !!!! !! !

!!!

! !!!!!

!!! !! !

!!! ! ! !! !

!!!!! !! !! ! !

!!!

! !!! !! !!

!! !!!!!

!!

!!

!!!! !

!!!! !

! !!!!!

!! ! !!!

! !! !!! !! !!

!!

!! !!

!! !! ! ! !

!!! !

! !! ! !!!!! !

!! !!!!!!!

!!!

!!!

!!

! !!!!!

! !!!

!!

!! !

! !!!!

! !! !!! !!

!!!! ! !! !

!!! !!! !

!! !!

!! !! !

!!!!!! !

!!

!!!! ! !

!! !!

! ! ! !! ! !! !!! !! ! !! !! !! !!

!! !!

!! !!!

! !!!!!!

! ! !! !! ! !!!

!!! !!! !!!! !!

!!! !! ! !!! !

!!!

! !!

!! !!! ! !! !!! ! !! !

!!!! !!

!!!! !! !!!!

! !!!!

!!!

!!! !! !

!!!!

!!!!

!!! !!! !

!!! ! !!!! !

!! !!

!!! !! !!! !!

! !! !!!

!!!

!! !!! !!!

! ! !!!!!! !!!

!!!! !! ! !! ! !!

!!!!!!

!!!! ! ! !

!!

!!!

!!

!!

! !!! ! ! !

! !! !!! !! ! !

!!!

!!

! !! !!

!!!!!! ! !!

!!! !!

! ! !!!!!! !! !

! !! !! !! !!!

!! !!

!!! !! !!!

!!!! ! !!!

!! !!! !!! !

!!

!!!! !! !

! !!!!

!! !

!!! !!! !! !! !!

! !!!

!! !!!! !!! !!

!! !

!!

!!! ! !! !

!!! !

!!

! ! !!

!! !

!!

!!

!!!! !!! !

!! ! !

!! ! !!! !!

! !!! ! !! !!!!

! !!! !!! !! !!!

! !!

!!! !

!! !!!!! !!!!! ! !!

!!!! !

!!! !!!! !!! ! !

!!!! !!!!

!! !! !! !!!

!!

!!! !!!!

!!! !!

! !!! !!! !!

!!!

!! ! !

!!

!! ! !! !!!!

!!

! !!

!!!

! !! !! !!! ! !

!! !! !!!

! ! !!!

!! !!!

! !!!! !

!! !!!

!!! !

!!!!

!!!! ! !! !

!!

!! ! !!!

!!! !! ! !!

!! !!

! !!!!

!! !! !

!!!! !!

!!

!!! !!!!

!! !

! ! ! !!!! !! !

!!! !!! !! ! !

! !! !! !!! !! !!! !

!! !!

!!!! !! !!

!!!!!!! !

! !!!

!!! !!

!!!

! !! !!

! !!

!!! !!!!

!!

! !!! !

! !! !! !!!! !! !! !

!!

!! !!!! ! ! ! !!! !! !

! ! !!! !!

! !!

!!

!!

! !! ! !! !! !

!!!

!!!

!!! !!! !! ! !!!! !! !! !!!!! !!

!! !!!! !

!! !! !!!

!! !!!! !!!! ! !!! !!

!!! !!!!

!! !

! !!

!!! !!! !! !!! !

!!

! !!!! ! !

! ! ! ! !!

!! !! !

!!! !

!! !!!! ! !

!!

! !!!! !!

! !!! ! !! !!!!

!!!

! !!

! !!!

!!!

! !! !! ! !!! !!

!! !! !

!!! !! ! !! !! !!!!!! !! ! !

!! !! !!

!! ! !! !!

!! !!

!! !! !

! !!! !

!! ! !!!

!! !!

!! !! !! !

!!!

! ! !! !! !

!!!! !! !!! !

!!! !!!!

!!! !!! !!! !! !!! !!!!! !! !

!!!

!!

!! !! !! !

! !! !! !! !! !

!!

!!! !! !!!! !! !!

!! !!!! !!

! !!

!! ! !!! !! !

! !!!!!!

! !! !!

! !!! ! !

!!!

!!

! ! !!! ! !

!!

!!!! !!!!! !!!

! ! !! !! !

!!! !!! !

!! !! !! !! !!

!! !! !

!!! !! !!

!!

! !! !

!!! !!

! !!!! !! !! ! !

! !!!

!!

!! ! !

!! !!! ! !

!!!

!! !!! !

!!!

!!! !!!

! !! !

!!!!!!!! !! !! !!

!!

!! !! !! !!!

!!! ! !!!! !

!! ! !! ! ! !!! !! !! !!

! !!

! !! !!!!!

!!!! !!

! !!

!!!!!!

!!! !! !! !

!! !! !

! ! !! !!

!! ! !!!

!!!!

! ! ! !! ! !!!!

!! !! ! !! ! !! !

!!

! !!

!!!! !! ! !!

! ! !! !

!!

!! !!

! !! ! !!! !

!!!! !!

!! !! !

! !!! !

!!

!!! !

!! ! !!

!!! !!! !!

!!!! ! !

!!!! ! !!

!!

!!

!!!

!!! !

!!!

!!!! !!! ! !! !

!!

!!

!!!

!!

!!

! !! !!! !! !!!! ! !!

!!! ! !! !

!!! !!! ! !

!!

!! ! !! !!!!! !! ! !!! ! !

!!!

!! !! !!

!!!

!!! !

!!! !! !!

! ! !!! !! !!!

!!!! ! !!!

!! !

!!! !

!!!! !

! !!!!!

! !!!!! ! !

!!!!!

!!!!

!! !! !

! ! ! !!! !!

!! !! ! !!! !!

!! !!!

! !! ! !! !!!

! !!! !!

! !! !!

!!

!!

!!

! !!!! !

! !! !! !!

!! !!

! !! !!!

! !! ! !! !! !! !!

!!!! !! ! !! !

!!! ! !! !!

! !! ! !!

!!!

!!

!!! ! !

!!

!!!

!!

! !!

!!!!!

!!

!!! !

! !!!

!!!!

!!

!! !!

!!

!!

! !!

!!!!!

!!! !

! !! !!! ! !!! ! !!!! ! !

! !! !! !

!!!! !!!! ! ! !!! ! !!! ! !! !!!!!

!!! !! ! !!! ! !!! !! !!! !! !! !!! !

!!!

! ! !! !!!! !! !!

!!

!!!

!!

!!!! !

!!

!! !!! !!!! !

!! ! !! !! !!!

! !!!

!! !

! !! !!!! !!

! !!

!!!

! ! !! !!!

! !!!! !!

!! !!!! !! !

!! !! !!!!! !!

!!!!

! !!! !! !! !!

!! !!! !

! !!!!

! !! !! !

!!!

! !! !!! !!!!

!!! !

!!

!!! !

! !!!

!!!!!

!! !!! !

! !! !!!!

! !!! !!!! !!

! !! !!! !!

!! ! !!! !!

!!! ! !!!

!! !

!!!

!! !!!!! !!

!!! !

!! !!!!

! !! !!!! ! !!

!!! !!!!

!!!! !

!!!!! !!

!!! !!

!!!! !!! !! ! !

! !! !!! !!! !!

!!

!!! !! !

!!! ! !! !!! !

! ! !! !! !!! !!

!!

! !!!! ! ! ! !!! ! !!!

!!!!!!!!!!!

! !! !!!

!!!! !

!! !

!!

!!!!!! !! !

!!! !!!

!! !! !!! ! !!! ! !!

!!!! ! !! !! !

!! !! ! ! !!! !!!! !

! !!

!!

!!

! !!!!

! !!! !

!! !!

!!! !

!!! !

!! !!!

!!!! !!

! ! !!!!!

!!

! !!!!! ! !!! !!!!

! !!! !! !! !! !!! ! ! !

!! !

! ! !! ! !!!

!!

!!!

!! !! !

!! !! !

! !!! !!! !!! ! !

! !! ! !

!!! !

!!!!

!!

!!!

!! !!!! !! !

! !! !!! !

! ! !!!!! !

! !!! !!!! !

! ! !!! !!

! !! ! !! !!!

!!

!! !! !!! !!

!!!

! !!!

!!

! !!!!

!! !! !! !! !!

!!

!!!!

!!!!

!! !!!! !!

! !! !!! !!

!!! !!!!! !

!! !

!!!!!!! !!! ! !!

! !!! ! !! !

!!! !

! !!!! !

!! !! !! !! !! !! !

!!

! !!! ! !!!

!! !!! !!!

! !!! !

!!

! !!! ! !!

!! ! !! !!!!! !!! ! !

! !! !!! !!!

!! !!!! !! !! ! ! !

! ! !!!! !! ! !! !

!! !!!

! ! !!!! !!

!!

!! !! !!!!! !

!! !! !

!!! ! !!

!!!! !!

!! !! ! !!! !!

!!! !!

!! !!!!

!! ! !!!

!!!

!!

! ! !!! !!! ! !!!! !!

!! !!! !! !

! !! !! !! !! !

! !!! !

!!! ! ! !!

!!

! !!! !! !!!!!

!!! !!

! ! !!! !!! ! ! !! !

!!!

!! !!! !

!! !!

! !!!! ! !! !!!

!! ! !!

!!! ! !!

! !!! ! !!!!! !!

!! !

! !!! !! !! !!! !

!! !! !!!

!!! !!

!!! !! !

!!

!! ! !!

!! !! !! !! !! !!!

!!!

! ! !!

!! !! !!!!

! !!!

! ! !!

! !!!!!

!! ! ! !!!

!!!!! !!! !

!!

! !!!!

!!

! ! !! ! !!!!!

!! !!

!!!! !! !!!! !! !

Fitting linear model with per-gene t-statistics: for each category j ,

yi = β0 + β1aij + εi ,

where aij = 1 if gene i is associated with category j , and 0otherwise. The index i may range over from universal genes to asubset of genes.

β1 ∼ N (0, 1)

Null hypothesisH0 : β1 = 0

24 / 29

Gene Set Enrichment Analysis

Page 33: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Linear models

Create a ChrMapLinearMParams object.

Code: instance of class ChrMapLinearMParams

> library(Category)> params <- new("ChrMapLinearMParams",+ conditional=FALSE,+ testDirection="up",+ universeGeneIds=entrezUniverse,+ geneStats=tstats,+ annotation="hgu95av2",+ pvalueCutoff=0.01,+ minSize=4L)

25 / 29

Gene Set Enrichment Analysis

Page 34: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Calling the linearMTest function

linearMTest: compute the p-values for detecting up- ordown-regulation of predefined gene sets.

Code: linearMTest

> lman <- linearMTest(params)

> lman

> summary(lman)

26 / 29

Gene Set Enrichment Analysis

Page 35: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Exercise

1 Get familiar with the structure of ChrMapLinearMParams class?ChrMapLinearMParams or help(“ChrMapLinearMParams-class”)

2 Perform conditional GSEA linear models to find interestingchromosome bands that are up-regulated.

3 Summarize the result of the conditional test using summary.

Code: conditional test> slotNames(params)> paramsCond <- params> paramsCond@conditional <- TRUE> lmanCond <- linearMTest(paramsCond)> summary(lmanCond)

27 / 29

Gene Set Enrichment Analysis

Page 36: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Exercise

1 Get familiar with the structure of ChrMapLinearMParams class?ChrMapLinearMParams or help(“ChrMapLinearMParams-class”)

2 Perform conditional GSEA linear models to find interestingchromosome bands that are up-regulated.

3 Summarize the result of the conditional test using summary.

Code: conditional test> slotNames(params)> paramsCond <- params> paramsCond@conditional <- TRUE> lmanCond <- linearMTest(paramsCond)> summary(lmanCond)

27 / 29

Gene Set Enrichment Analysis

Page 37: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Summary

1 Basic idea behind GSEA.

2 Simple GSEA: t-tests and permutation.

3 Using KEGG categories.

4 Linear models and chromosome band categories.

5 Hypergeometric testings on GO BP terms.

28 / 29

Gene Set Enrichment Analysis

Page 38: Gene Set Enrichment

Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models

Reference

Assaf P. Oron et. al., Gene set enrichment analysis using linearmodels and diagnostics, Bioinformatics, vol. 24 no. 22, pp.2566-2591, 2008.

Florian Hahne et. al., Bioconductor Case Studies, chapter 13-14,Springer, 2008.

Deepayan Sarker et. al., Using Categories defined chromosomebands, Bioconductor Category package vignette.

D. Sarker et.al., Modeling gene expression data via chromosomebands, Bioinformatics, 2007.

29 / 29

Gene Set Enrichment Analysis