56
Chapter 8: Biological Knowledge Assembly and Interpretation Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao

Chapter 8: Biological Knowledge Assembly and Interpretation

  • Upload
    mandell

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapter 8: Biological Knowledge Assembly and Interpretation . Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao . Outline. Review of major computational approaches to facilitate biological interpretation of - PowerPoint PPT Presentation

Citation preview

PowerPoint Presentation

Chapter 8: Biological Knowledge Assembly and Interpretation Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea,

Presenter: Zhen Gao

2OutlineReview of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

23Input: Microarray / RNA seqDEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis Gene list with annotationsVisualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice FAA: Functional Annotation Analysis GO: Gene Ontology Pathway DEG: Differentially Expressed GenesGSEA: Gene Set Enrichment Analysis Biological Interpretation and Biological Semantics Concept lattice analysis

4Glossary Pathway and Ontology-Based Analysis GO and biological pathway-based analysis:one of the most powerful methods for inferring the biological meanings of expression changes list of genes obtained by:differential expression analysis co-expression analysis (or clustering)

TF-DNA interaction is determined not only by the sequence specificity, but also structural and chemical properties

56Pathway and Ontology-Based Analysis

7

Attributes can be applied for FAA: transcription factor binding clinical phenotypes like disease associationsMeSH (Medical Subject Heading) termsmicroRNA binding sitesprotein family membershipschromosomal bands, etcGO terms biological pathways

8Pathway and Ontology-Based Analysis Features may have their own ontological structures

GO has a structure as a DAG (Directed Acyclic Graph)

9Pathway and Ontology-Based Analysis DEGs:

10Pathway and Ontology-Based Analysis 11Input: Microarray / RNA seqDEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis Gene list with annotationsVisualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice DEGs:3 techniques which help obtain DEGs:t-testWilcoxons rank sum test ANOVA

Need to note that multiple-hypothesis-testing problem should be properly managed

12Pathway and Ontology-Based Analysis Co-expression analysis

13Pathway and Ontology-Based Analysis Co-expression analysisputs similar expression profiles together and different ones apart Returning genes that are assumed to be co-regulated

Clustering algorithms:hierarchical-tree clustering partitional clustering

14Pathway and Ontology-Based Analysis Pathways are powerful resources for the understanding of shared biological processes E.g.: KEGG, MetaCyc and BioCarta (signaling pathways)

15Pathway and Ontology-Based Analysis MetaCyc:an experimentally determined non-redundant metabolic pathway database It is the largest collection containing over 1400 metabolic pathways

16Pathway and Ontology-Based Analysis Ontology / GO:providing a shared understanding of a certain domain of information controlled vocabularies

DAG structures with 3 vocabularies of GO:Molecular Function (MF) Cellular Compartment (CC) Biological Process (BP)

17Pathway and Ontology-Based Analysis Common Gos:MIPS: integrated source, protein properties, variety of complete genomes MeSH: clinical including disease names OMIM (Online Mendelian Inheritance in Man) UMLS (Unified Medical Language System)

18Pathway and Ontology-Based Analysis GO enrichment test:For exampleif 20% of the genes in a gene list are annotated with a GO term apoptosis only 1% of the genes in the whole human genome fall into this functional category

19Pathway and Ontology-Based Analysis

Common statistical tests:Chi-square binomial hypergeometric tests

20Pathway and Ontology-Based Analysis hypergeometric test:

21Pathway and Ontology-Based Analysis

Avoid pitfalls when using hypergeometric testChoice of background, that makes substantial impact on the result. All genes having at least one GO annotationall genes ever known in genome databasesall genes on the microarray GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories

22Pathway and Ontology-Based Analysis Common Tools DAVID ArrayX- Path Pathway Miner EASE GOFish GOTree etc.

23Pathway and Ontology-Based Analysis 24

25Gene Set-Wise Differential Expression Analysis 26Input: Microarray / RNA seqDEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis Gene list with annotationsVisualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice Evaluates coordinated differential expression of gene groups

Gene Set Enrichment Analysis (GSEA)The first developed in this category evaluates for each a pre-defined gene set the significant association with phenotypic classes

27Gene Set-Wise Differential Expression Analysis Difference between FAA and GSEA:

FAA: find over-represented GO terms from a interesting gene list

GSEA: obtain the pre-defined gene list first and test the changes under different conditions.

28Gene Set-Wise Differential Expression Analysis 29

Advantages of gene set-wise differential expression analysis:successfully identified modest but coordinated changes in gene expression that might have been missed by conventional individual gene-wise differential expression analysis.(many tiny expression changes can collectively create a big change)straightforward biological interpretation because the gene sets are defined by biological knowledge

30Gene Set-Wise Differential Expression Analysis Enrichment Score (ES) is calculated by evaluating the fractions of genes in S (hits) weighted by their correlation and the fractions of genes not in S (misses) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,

31Gene Set-Wise Differential Expression Analysis

Typical gene sets:regulatory-motiffunction-relateddisease-related sets

Database:MSigDB: 6769 gene sets classified into five different collectionsHas some interesting extensions

32Gene Set-Wise Differential Expression Analysis 33Differential Co-Expression Analysis 34Input: Microarray / RNA seqDEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis Gene list with annotationsVisualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice Co-expression analysis:determines the degree of co-expression of a cluster of genes under a certain condition Differential co-expression analysis:determines the degree of co-expression difference of a gene pair or a gene cluster across different conditions

35Differential Co-Expression Analysis 3 major types:(a) differential co-expression of gene cluster(s) (b) gene pair-wise differential co- expression (c) differential co-expression of paired gene sets

36Differential Co-Expression Analysis 37

Type (a), identify differentially co-expressed gene cluster(s) between two conditions Let conditions and genes be denoted by J and I, respectively. The mean squared residual of model is a measurement of co-expression of genes:

38Differential Co-Expression Analysis

39Differential Co-Expression Analysis Type (a) cont.Type (b)40Differential Co-Expression Analysis Type (b), identify differentially co-expressed gene pairs

Techniques:F-statistic A meta-analytic approach

41Differential Co-Expression Analysis Note that identification of differentially co-expressed gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs. Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.

42Differential Co-Expression Analysis Type (c), dCoxS (differential co-expression of gene sets) algorithm identifies gene set pairs differentially co-expressed across different conditions Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.

43Differential Co-Expression Analysis Type (c) cont.To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.

44Differential Co-Expression Analysis

Type (c) cont.

45Differential Co-Expression Analysis

46Biological Interpretation and Biological Semantics 47Input: Microarray / RNA seqDEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis Gene list with annotationsVisualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice Biomedical semantics provides rich descriptions for biomedical domain knowledge.

Motivation for Biological Semantics:GO has limitations: The result of GO is typically a long unordered list of annotationsMost of the analysis tools evaluate only one cluster at a time time-consuming to read the massive annotation lists hard to manually assembleMany annotations are redundant

48Biological Interpretation and Biological Semantics Introducing BioLattice:a mathematical framework based on concept lattice analysis organize traditional clusters and associated annotations into a lattice of conceptsA graphical summary considers gene expression clusters as objects and annotations as attributes

Thus, complex relations among clusters and annotations are clarified, ordered and visualized.

49Biological Interpretation and Biological Semantics Another advantage of BioLattice is that heterogeneous biological knowledge resources can be added

50Biological Interpretation and Biological Semantics 51

Tool to construct BioLattice:The Ganter algorithm http:// www.snubi.org/software/biolattice/

52Biological Interpretation and Biological Semantics 53

Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

54Conclusion55Input: Microarray / RNA seqDEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis Gene list with annotationsVisualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice 56