68
Genopolis Microarray Genopolis Microarray DB DB a Progress Report a Progress Report Marco Brandizi <[email protected] t> Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Embed Size (px)

Citation preview

Page 1: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Genopolis Microarray DBGenopolis Microarray DBa Progress Reporta Progress Report

Genopolis Microarray DBGenopolis Microarray DBa Progress Reporta Progress Report

Marco Brandizi<[email protected]>

Dec 12, 2005

Dottorato in InformaticaXIX Ciclo

Page 2: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 3: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

DNA

gene

mRNA

protein

Genes Machine

Cell/Life

Page 4: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarray Data, conceptual model

Page 5: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarray Data Management Issues

Exp. data vs. seq. data:

Context dependent (living system, exp. Conditions)

Lack of standard unit of measure

Several normalizations methods

Multiple platforms and methods

No standard for data annotation

Vocabularies and terminology coherence

Details about: experiment, source, protocols, exp.

conditions

Page 6: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarrays Data Management Issues / 2

Evidences about data quality

What to store?

Raw Images

Computed values

Normalized values

How to find data

Complex vocabularies aware systems (ontologies)

Data mining and exp. comparison tools

Data access control

Page 7: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

MIAME Experiment Modeling

Page 8: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 9: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA FeaturesCurated experimental design representation

MIAME-compliant, (although with simplified model)

Use of controlled vocabularies

Experiment checking/publishing, with supervision

Targeted to Affymetrix platform

Chip description is simple, imported from NETAffx

Single channel technology

Access control

Users are grouped into groups and access roles

Experiments belong to user groups

Page 10: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA FeaturesData Retrieval and visualization

Gene browser, a graphical visualization interface, based on the

matrix model

Search & Save data

Current content:

A set of time-courses about DCs stimulated with different stimuli

Implementation & Deployment

LAMP application (Linux + Apache + MySQL + PHP)

Model Viewer Controller as much as possible:

Business objects layer

Presentation widgets (DAO-lib)

Other application control layers

Page 11: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Features

Shortly:

A Gene Expression database software, focused

on Affymetrix technology, useful as a facility for a

distributed community of users

Page 12: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Data Model

Page 13: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Data Model

Page 14: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 15: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Login

Page 16: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Editing

Page 17: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Experiment Checking

Page 18: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Import of chip annotations

Page 19: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA CVs and protocols

Page 20: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA CVs and protocols

Page 21: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 22: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Gene Browser

Page 23: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Gene Browser

Page 24: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Gene Browser

Page 25: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Gene Browser

Page 26: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Gene Browser

Page 27: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 28: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA Access Management

Page 29: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Bicocca Besta

Granucci

ADMIN

NormanTiranti

Andrea

Brandizi

Experiment 123

Ottavio

User PermissionsBrandizi, Andrea AllGranucci ReadNorman Read, WriteTiranti All (except admin)Ottavio None

User PermissionsBrandizi, Andrea AllGranucci ReadNorman Read, WriteTiranti All (except admin)Ottavio None

All but admin

All rights

R, W, -publish

Read only

Page 30: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo
Page 31: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo
Page 32: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo
Page 33: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo
Page 34: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Access managementAccess managementAccess managementBased on a core library

Recent developments (security lib)

Code has been changed so that it uses security lib

All the code that interacts with user has been wrapped with

access management controls

Even malicious access attempts has been considered:

Handy writing of an URL

Handy request of an uploaded file (to be completed)

Does it work?

Yes, pretty sure

But more testing is needed

Page 35: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 36: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 37: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 38: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 39: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 40: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 41: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 42: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 43: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Search and Save

Page 44: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 45: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

MAGE ExportWill allow to export a GCA experiment to MAGE/Array Express

A collaboration with EBI

in the context of u-GENE

So far:

Schema of GCA->MAGE

(in AE compatible form)

Basic code fragments

(Business objects in Java)

Still to do

Full code

Mappings with MGED-Ontology

Tests with AE

Page 46: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

MAGE Export

Page 47: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 48: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA on cluster architectureThree machines, the minimum to have a cluster

Master (Xeon 3.2 Ghz, 2Gb RAM)

+ Master Clone that ensures high availability

computation node computers

(P4 3 Ghz, 512Mb)

1Tb of SCSI disk, shared via NFS

Based on:

Debian (Linux)

Linux Virtual Server

(Load Balancer)

Hearthbeat (High availability)

Page 49: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA on cluster architecture

Page 50: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA on cluster architectureCode needs slight changes:

PHP side and sessions:

Objects that are saved on session need to be reloaded properly

See:

http://it2.php.net/manual/en/language.oop.magic-functions.php#14473

__wakeup() is already used

__sleep() with proper return value is to be implemented

MySQL side:

The stable DB:

We need to specify the type of DB access: Read Only Mode vs.

Read/Write mode

RO access uses local copy of DB

RW access uses master copy

The temporary DB:

Only master copy exists (3307 port, current deployment)

Page 51: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA on cluster architecturePossible other uses of cluster

Heavy computations (normalizations)

Integration with R (Grad. Thesis of L. Vanotti, Grad. Th. of M.

Sesana)

Other Integrations with R (AMDA)

Other related services

Knowledge management app.

Groupware Integration (DC-Thera)

Cytoscape Integration (DC-Thera)

Service mgmt. app.

R computations

BUT it has been designed for GCA

Page 52: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 53: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

The mA Experiments Cycle

Page 54: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

“Closing the loop”

Page 55: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

What we need to model

Page 56: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

What we need to model

Page 57: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

What we need to model

Page 58: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

How to model: Semantic Web Technologies“The Semantic Web is an extension of the current web in whichinformation is given well-defined meaning, better enabling computersand people to work in cooperation.”

Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001

Page 59: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarrays Annotation Ontology

Microarray entities

Annotation entities

Page 60: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarrays Annotation OntologyAnnotation (source, target, child, parent, rank)

Page 61: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarrays Annotation Ontology

Page 62: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Microarrays Annotation Ontology

Page 63: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Examples of useAnnotating a saved search

Comments and answer to comments

Originating operations (import, intersection, merge...)

Which user is working on this data set

Why the data set is being saved

Functional family

I'm studying IL2

I'm studying Shistosomia

disease

Page 64: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Examples of use: AMDA

Page 65: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Examples of use: AMDARepresentation an AMDA report in a structured form

DEGs and genes clusters

This is a DEG set, computed by AMDA (PAM method ) on samples

s1, s2, s3

Correlation between chips (storing values and links to

chip pairs)

Functional annotations of genes, by means of KEGG

(with reporting of significance)

Import of analysis annotations on GCA

Presenting analysis annotation together with data sets

Page 66: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

OutlineIntroduction

GCA Application

Main features

Demo

Demo/Gene Browser

Recent added features

Access control

Search & Save

Ongoing and future

MAGE Export

Migration on cluster

Management of knowledge about Higher Level Analysis

Other possible developments

Page 67: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

GCA: other possible developmentsTemplates for experiment insertion

Advanced CVs (taxonomies, mapping to MGED-Ontology)

Knowledge management features (with or without annotation

ontology)

e-Groupware and links between eGroupware

forums/documents and GCA experiments/data sets

Integration with AMDA (with or without annotation ontology)

Export of API, via Web Services Technology

Integration with Taverna or Cytoscape

Connections with pathway databases (ex.: by means of

Pathway processor)

Page 68: Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX Ciclo

Thank you!

[email protected]

Find this presentation at: http://bioguest.btbs.unimib.it/~brandizi/