26
Daniel Rico, PhD. [email protected] ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Uni CNI

Daniel Rico , PhD. [email protected]

  • Upload
    lyre

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Course on Functional Analysis. ::: Introduction to Functional Analysis. ?. Daniel Rico , PhD. [email protected]. Bioinformatics Unit CNIO. ::: Schedule. Biological (Functional) Databases Threshold-based and threshold free methods Threshold-based example: FatiGO. - PowerPoint PPT Presentation

Citation preview

Page 1: Daniel Rico , PhD.  drico@cnio.es

Daniel Rico, PhD. [email protected] Rico, PhD. [email protected]

::: Introduction to Functional Analysis::: Introduction to Functional Analysis

Course on Functional AnalysisCourse on Functional Analysis

Bioinformatics UnitCNIO

Bioinformatics UnitCNIO

Page 2: Daniel Rico , PhD.  drico@cnio.es

::: Schedule.

1. Biological (Functional) Databases2. Threshold-based and threshold free methods3. Threshold-based example: FatiGO.4. Threshold free example 1: FatisScan.

Page 3: Daniel Rico , PhD.  drico@cnio.es

Many of these slides have been taken and adapted from original slides by Fatima Al-Shahrour from Joaquin Dopazo’s group (Babelomics team).

We are grateful for the material and for the great tools they have developed!!!!

ACKNOWLEDGEMENTS

Page 4: Daniel Rico , PhD.  drico@cnio.es

Arabidopsis thaliana

Homo sapiens

Mus musculus

Rattus

norvegicus

Drosophila melanogaster

Caenorhabditis elegans

Saccharmoyces cerevisae

Gallus gallus

Danio

rerio

HGNC symbol

EMBL acc

RefSeq

PDB

Protein Id

IPI….

Genes IDs

Gene Ontology

Biological Process Molecular Function Cellular Component

UniProt/Swiss-Prot

UniProtKB/TrEMBL

Ensembl IDs

EntrezGene

Affymetrix

Agilent

KEGG pathways Regulatory elementsmiRNA

CisRed

Transcription Factor Binding Sites

Biocarta pathways

InterPro Motifs

Bioentities from literature:

Diseases terms Chemical terms

Gene Expression in tissues

Keywords Swissprot

Biological databases

Page 5: Daniel Rico , PhD.  drico@cnio.es

Gene Ontology CONSORTIUM

http://www.geneontology.org • The objective of GO is to provide controlled vocabularies for

the description of the molecular function, biological process and cellular component of gene products.

• These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them.

• The controlled vocabularies of terms are structured

Page 6: Daniel Rico , PhD.  drico@cnio.es

GO structureThe three categories of GO

Molecular Function

the tasks performed by individual gene products; examples are transcription factor and DNA helicase

Biological Process

broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

Cellular Component

subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

GO tree structure

IS_A relation

PART_OF relation

Page 7: Daniel Rico , PhD.  drico@cnio.es

http://www.genome.ad.jp/kegg/pathway.html

Page 8: Daniel Rico , PhD.  drico@cnio.es

http://www.biocarta.com/genes/index.asp

Page 9: Daniel Rico , PhD.  drico@cnio.es

http://www.reactome.org/

Page 10: Daniel Rico , PhD.  drico@cnio.es

http://www.pathwaycommons.org

Page 11: Daniel Rico , PhD.  drico@cnio.es

http://www.whichgenes.org/

Page 12: Daniel Rico , PhD.  drico@cnio.es

http://www.cisred.org/

Page 13: Daniel Rico , PhD.  drico@cnio.es

::: Schedule.

1. Biological (Functional) Databases2. Threshold-based and threshold free methods3. Threshold-based example: FatiGO.4. Threshold free example 1: FatisScan.

Page 14: Daniel Rico , PhD.  drico@cnio.es

The two-steps approach

• Genes of interest are selected using the experimental value.

• Selected genes are compared to the background.

Threshold-based functional analysis

Study the enrichment in functional terms in groups of genes defined by

the experimental value.

FatiGO

GOminer

DAVID

Marmite

Threshold-free functional analysis

Select genes taking into account their functional properties.

FatiScan

GSEA

MarmiteScan

• Under a systems biology perspective.

• Detect blocks of functionally related genes.

Page 15: Daniel Rico , PhD.  drico@cnio.es

Class1 Class2

ttest cut-off

FDR<0.05

FDR<0.05

Biological meaning?

Threshold-based functional analysis

Page 16: Daniel Rico , PhD.  drico@cnio.es

ES/NES statistic

-

+

Class1 Class2

Gene Set 1

ttest cut-off

Gene Set 2

Gene Set 3

Gene set 3enriched in Class 2

Gene set 2enriched in Class 1

Threshold-free functional analysis

Page 17: Daniel Rico , PhD.  drico@cnio.es

::: Schedule.

1. Biological (Functional) Databases2. Threshold-based and threshold free methods3. Threshold-based example: FatiGO.4. Threshold free example 1: FatisScan.

Page 18: Daniel Rico , PhD.  drico@cnio.es

http://babelomics.bioinfo.cipf.es/

Page 19: Daniel Rico , PhD.  drico@cnio.es

::: How the functional profiling should never be done

It is not uncommon to find the following assertion in papers and talks: “then we examined our set of genes selected in this way (whatever) and we discover that 65% of them were related to metabolism, so we can conclude that our experiment activates metabolism genes”.

Annotation is not a functional result!!!

Page 20: Daniel Rico , PhD.  drico@cnio.es

::: Exercise 1: FatiGO SEARCH

1. Select “FatiGO Search” ” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “KEGG pathways” and click “Run”

Page 21: Daniel Rico , PhD.  drico@cnio.es

::: Exercise 1: FatiGO SEARCH

1. Select “FatiGO Search” ” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “KEGG pathways” and click “Run”

FatiGO-Search annotations

Page 22: Daniel Rico , PhD.  drico@cnio.es

Testing the distribution of GO terms among two groups of genes

(remember, we have to test hundreds of GOs)

Biosynthesis 60% Biosynthesis 20%

Sporulation 20% Sporulation 20%

Group A Group B

Genes in group A have significantly to do with biosynthesis, but not with sporulation.

Are this two groups of genes

carrying out different

biological roles?

84No biosynthesis

26Biosynthesis

BA

Page 23: Daniel Rico , PhD.  drico@cnio.es

Using FatiGO

List1: genes of interest (they are significantly over- or under-expressed when two classes of experiments are compared, co-located in the chromosomes, etc.)

List2:the background (typically the rest of genes).

Select suitable database, Run...

List2

Remove genes repeated in list1

Remove genes repeated between

both lists

Remove genes repeated in list2

Extract functional

terms

Comparing groups of genes

List1“clean” List1

“clean” List2

BABELOMICS

GOKEGG

InterproKW

BioentitiesGene

ExpressionTF

Cisred

011000101010101001 ......11001010 ...........010001010 ...........0110001010 ...........1111001111...............

Matrix of functional

terms

Fisher´s test

Adjust p-value by FDR

Page 24: Daniel Rico , PhD.  drico@cnio.es

ttest cut-off

FDR<0.05

FDR<0.05

List 1

List 2(background)

Class1 Class2

List 1b / List 2b

Page 25: Daniel Rico , PhD.  drico@cnio.es

::: Exercise 2: FatiGO COMPARE

1. Select “FatiGO Compare” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “Rest of Genome” as background.4. Select “KEGG pathways” and click “Run”

Page 26: Daniel Rico , PhD.  drico@cnio.es

::: Exercise 2: FatiGO COMPARE

1. Select “FatiGO Compare” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “Rest of Genome” as background.4. Select “KEGG pathways” and click “Run”

Only “Apoptosis” is significant