71
07/07/2010 Eines bioinformàtiques i estadístiques per a la investigació biomèdica Anàlisi de dades amb Ingenuity Pathways Alex Sánchez Unitat d’Estadística i Bioinformàtica

Eines bioinformàtiques i estadístiques per a la investigació biomèdica Anàlisi de dades amb Ingenuity Pathways Alex Sánchez Unitat d’Estadística i Bioinformàtica

Embed Size (px)

Citation preview

07/07/2010

Eines bioinformàtiques i estadístiques per a la investigació biomèdica

Anàlisi de dades amb Ingenuity Pathways

Alex SánchezUnitat d’Estadística i Bioinformàtica

07/07/2010

We are drowning in information and starved for knowledge

John Naisbitt

Who on efficient work is bent,Must choose the fittest instrument.

Goehthe (Fausto)

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 3

Esquema de la presentación

• Más allá de los microarrays…• Ingenuity Pathways Analysis

– Visión general– Componentes– Tipos de estudios

• Ejemplos de uso – Exploración y búsqueda de información– Análisis de datos

07/07/2010

Más allá de los microarrays …

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 5

Un experimento con microarrays...

Listas de identificadores (genes, miRNAs, …) seleccionados

07/07/2010

So Where do we go from here? Or, How To Drive A Biologist Crazy?

Ted SlaterTed Slater

Proteomics Center of EmphasisProteomics Center of Emphasis

Pfizer Gobal R&D MichiganPfizer Gobal R&D Michigan

• gi|84939483 • gi|39893845 • gi|27394934 • gi|18890092 • gi|10192893 • gi|11243007 • gi|20119252 • gi|19748300

• gi|44308356 • gi|50021874 • gi|10003001 • gi|27762947 • gi|24537303 • gi|27284958 • gi|37373499 • …

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 7

De las listas a la Biologia

• Enfoque tradicional para el análisis de las listas de genes: de uno en uno – Literatura, bases de datos, ...

• Problema:– Tarea lenta, tediosa y, lo que es peor ...– Ignora posibles interacciones

• Enfoque alternativo: Análisis Funcional o de “Significación Biológica”.

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 8

Los métodos de Análisis Funcional

• Son métodos automáticos para– Identificar procesos biológicos asociados con los

resultados experimentales.– Determinar los temas funcionales comunes a grupos

de genes seleccionados.– Analizar las conexiones entre genes, moléculas y

enfermedades mediante la exploración automática de la literatura para descubrir asociaciones relevantes con los resultados experimentales.

• Facilitan el uso de información auxiliar.• Ayudan a entender los fenómenos biológicos

subyacentes.

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 9

Herramientas de Análisis Funcional

• Docenas de programas en los últimos 10 añoshttp://estbioinfo.stat.ub.es/resources/index.html

• Estudio directo de las listas de genes– Basadas en GO u otras BD (KEGG,...)

• fatiGO, DAVID, GSEA, Babelomics ... [SerbGO]• Ingenuity Pathways Analysis

• Exploracion de relaciones en la literatura– PubMed, Scopus, HighWire, GOPubMed, …– Ingenuity Pathways Analysis

• Estudio de pathways asociados con las listas– Pathway Explorer, GenMapp, – Ingenuity Pathway Analysis

07/07/2010

Cursos y materiales

• CNIO – 4th Course on Functional Analysis of Gene

Expression

• Canadian Bioinformatics Workshop– Interpreting gene lists from omics sets 

• EADGENE and SABRE – Post-analyses Workshop

07/07/2010

Ejemplos de Análisis Funcional

07/07/2010

Ejemplo 1

• The Polycomb group protein EZH2 is involved in progression of prostate cancer (Nature, 419 (10) 624-629)– Varambally et al. (2002) estudian las

diferencias entre cancer de prostata localizado (PCA) y metastático (MET)

• EZH2 sobreexpresado en MET • Los casos de PCA con EZH2 alto peor prognosis

– Sugieren que EZH2 puede • Estar implicado en la progresión PCAMET• Distinguir el PCA benigno del de mal pronóstico.

07/07/2010

Ejemplo 1

• Análisis de microarrays – Listas de genes up (55) y down (438) reg.

• Un análisis funcional permitirá estudiar– Que procesos biológicos (pathways) estan

relacionados con los genes de las listas• Bases de datos de anotaciones

– Que funciones se presentan en las listas con una frecuencia distinta de la de todos los genes estudiados

• Análisis de enriquecimiento

– Las herramientas disponibles en Babelomics son una buena opción para este análisis.

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 14

Ejemplo 2 – De genes a Pathways

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 15

Los genes se agrupan por funciones

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 16

Las funciones se asocian a pathways

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 17

Los cambios de expresión se proyectan en el pathway

07/07/2010

Introducción

“The Ingenuity View”

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 19

Ingenuity Pathways Analysis

• Ingenuity Pathways Analysis (IPA) is an all-in-one software application that enables researchers to model, analyze, and understand the complex biological and chemical systems at the core of life science research

07/07/2010

IPA Challenge

Integrate, Interpret, Gain Therapeutic Insight from Experimental Data

Expression Arrays Proteomics Traditional Assays

Experimental Platforms

FASFAS VEGFAVEGFA bevacizumabbevacizumabMolecules

ApoptosisApoptosis AngiogenesisAngiogenesisCellular

Processes

CancerCancerDisease

Processes

Disease/physiological response

Overlapping cellular

processes/pathways

Molecular Interactions

Molecular Perturbation

07/07/2010

IPA Challenge

Gain Rapid Understanding of Experimental Systems

Expression Arrays Proteomics Traditional Assays

Experimental Platforms

FASFAS VEGFAVEGFA bevacizumabbevacizumabMolecules

AngiogenesisAngiogenesisApoptosisApoptosis

Cellular Processes

CancerCancerDisease

Processes

Guide in vivo/in vitro assays

Search for genes implicated in

disease

Identify related cellular

processes/pathways

Generate hypothesis

07/07/2010

Ingenuity Platform

• Findings manually extracted from full text

• Extensive libraries of metabolic and signaling pathways

• Chemical and drug information

• Scalable best-in-class content acquisition processes

• Designed to enable computation

• Consists of biological objects and processes in organized into major branches

• Robust, up-to-date synonym library

• Knowledge infrastructure tools and processes for structuring biological and chemical knowledge

Ingenuity Knowledge Base

Content Ontology

07/07/2010

Ingenuity Knowledge Base: Content

Expert Extraction: Full text from top journals

• Coverage of peer-reviewed journals, plus review articles and textbooks

• Manually extracted by Ph.D. scientists

Import Annotations, Findings:

• OMIM, GO, Entrez Gene

• Tissue and Fluid Expression Location

• Molecular Interactions (e.g. BIND, DIP, TarBase)

Internally curated knowledge:

• Signaling & Metabolic Pathways

• Drug/Target/Disease relationships

• Toxicity Lists

All findings are structured for computation and updated regularly

07/07/201017/06/2009 Alex Sánchez. Unitat d'Estadística i Bioinformatica 24

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 25

Como trabajan juntos

07/07/2010

Tipos de análisis

07/07/2010

Preguntas y respuestas

07/07/2010

Instalación, acceso y uso

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 29

Instalación y puesta en marcha

• IPA funciona en línea. – No se instala. Tan sólo se accede a él

• Para utilizarlo se necesita una cuenta– Prueba (15 días).– Acceso (IRHUVH y HVH) mediante reserva

previa a la UEB y en horario de mañana o tarde.

• Funciona en Windows o Mac, pero no en Linux

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 30

Requisitos del sistema

07/07/2010

Acceso

07/07/2010

Formas de arrancar IPA

07/07/2010

El entorno de Ingenuity

Pantallas, menús, ayudas

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 34

Pantalla de inicio rápido

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 35

Gestor de proyectos

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 36

Barra de búsqueda

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 37

Ayuda (1) Sistemas de ayuda

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 38

Ayuda (2) Tutoriales

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 39

Ayuda (3). Workflows

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 40

Programa de formación

Capacidades básicas del programa

Búsqueda, Análisis, Comunicación

07/07/201017/06/2009 Alex Sánchez. Unitat d'Estadística i Bioinformatica 42

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 43

Search & Explore

07/07/2010

Search and Explore Biological & Chemical Knowledge

07/07/2010

Gene View / Chem View

07/07/2010

Dynamic Signalling & Metabolic Pathways

07/07/2010

My Pathway & Lists

• Build custom libraries of pathways representing mechanism of action and mechanism of toxicity. Create custom, literature-supported signaling pathways with proteins of interest. Store collections of custom pathways and lists for subsequent core, IPA-Tox™, IPA-Biomarker™, or IPA-Metabolomics™ analyses.

• Use the Grow and Connect tools to edit and expand networks based on the molecular relationships most relevant to the project: – Transcriptional networks – Phosphorylation cascades – Protein-Protein or Protein-DNA interaction networks – microRNA-mRNA target networks – Chemical effects on proteins

• Use Search results as building blocks for custom pathways – Identify cross-talk between biological processes and pathways – Understand whether gene lists and signatures are tightly

connected at the molecular level

07/07/2010

Path Explorer Path Designer

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 49

Analyze & Interpret data

• IPA Core Analysis• IPA Tox Analysis• IPA Biomarker Analysis• IPA Metabolomic Analysis

07/07/2010

07/07/2010

IPA Core Analysis

07/07/2010

IPA-Biomarker™ Analysis

• IPA-Biomarker identifies the most promising and relevant biomarker candidates within experimental data. – Prioritize molecular biomarker candidates based on

key biological characteristics. – Elucidate mechanism linking potential markers to a

disease or biological process of interest. – Perform analysis across biomarker lists to find

biomarker candidates unique to a disease stage or common across all stages.

– Understand the molecular differences between patient populations.

07/07/2010

IPA.Tox Analysis

• IPA-Tox delivers a focused toxicity and safety assessment of candidate compounds.– Enables assessment of the toxicity and safety of

compounds early in the development process. – Provides expert molecular toxicology data

interpretation to non-expert users. – Reveals clinical pathology endpoints associated with

a dataset. – Generates new hypotheses that may not have been

revealed using traditional toxicology approaches. – Elucidates mechanism of toxicity and identify

potential markers of toxicity.

07/07/2010

IPA-Metabolomics Analysis

• IPA-Metabolomics extracts rich pathway information from metabolomics data. – Overcomes the metabolomics data analysis

challenge by integrating transcriptomics, proteomics, and metabolomics data to enable a complete systems biology approach.

– Provides the critical context necessary to gain insights into cell physiology and metabolism from metabolite data.

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 55

Communicate & Collaborate

• Share• Report• Interactive Pathways• Integrate with other software

07/07/2010

Resumen y recapitulación

07/07/2010

Resumen

• El análisis funcional mejora la comprensión de los fenómenos biológicos mediante el estudio simultáneo de grupos de valores.

• Ingenuity Pathways permite– Explorar – Analizar– Comunicar y compartir

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 58

Ventajas e inconvenientes

Intuitivo y fácil de usar Integración de todas las funcionesMuy potente en humanos y cáncer

No tan potente en otras especies o enfermedades.

No es libre sino que hay que pagarlo No incorpora algoritmos avanzados

potentes como GSEA

Networks and Pathways in IPA

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 60

Networks

• A network is a set of terms (“nodes”) related by a set of relations (“edges”).

• IPA transforms a list of genes into a set of relevant networks based on information maintained in the Ingenuity Pathways Knowledge Base (IPKB)

• This knowledge base has been abstracted into a large network, called the Global Molecular Network, composed of thousands of genes and gene products that interact with each other.

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 61

A network in IPA

07/07/2010

Networks in IPA

• Purpose: – To show as many interactions between user-

specified molecules in a given dataset and how they might work together at the molecular level

• Why are Ingenuity networks biologically interesting?– Highly-interconnected networks are likely to

represent significant biological function

07/07/2010

Key Terminology

• Focus Molecule:– Molecules that are from uploaded list, pass filters are

applied, and are available for generating networks• Networks:

– Generated de novo based upon input data– Do not have directionality– Contain molecules involved in a variety of Canonical

Pathways• Canonical Pathways (Signaling and Metabolic)

– Are generated prior to data input, based on the literature– Do NOT change upon data input– Do have directionality (proceed “from A to Z”)

• My Pathways and Path Designer Pathways– Custom built pathways manually created based on user

input

07/07/2010

Viewing networks

07/07/2010

How Networks Are Generated

1. Focus molecules are “seeds”2. Focus molecules with the most

interactions to other focus molecules are then connected together to form a network

3. Non-focus molecules from the dataset are then added

4. Molecules from the Ingenuity’s KB are added

5. Resulting Networks are scored and then sorted based on the score

35 molecules per network for visualization purposes

07/07/2010

Calculation of Score for Networks in IPA

• Based on the Right-tailed Fisher's Exact Test• Used as a means to rank/sort networks so that those with the

most focus molecules are at the top of the list• Takes into account the number of focus molecules in the

network and the size of the network • Not an indication of the quality or biological significance of the

network

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 67

Network notation (1)(Help legend)

07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 68

Network notation (2)

07/07/2010

Significance Calculations

• Measures the likelihood that a function is over-represented by the molecules in your dataset

• Expressed as a p-value calculated by using the right-tailed Fisher's Exact Test

• Range indicates most significant low level function to least significant low-level function

07/07/2010

Multiple Testing Correction

•Benjamini-Hochberg method of multiple testing correction

•Calculates False Discovery Rate– Threshold indicates the fraction of false positives

among significant functions

0 0.05 1.0

5% (1/20) may be a false positive

07/07/2010

Which p-value calculation should I use?

•What is the significance of function X relative to the dataset?– Use right-tailed Fisher’s Exact test result

•What is the significance of function X relative to all the other functions in the dataset?– Use Benjamini-Hochberg multiple testing

correction