20
1 Semantic Web Technologies for Semantic Web Technologies for Analysis of Transcriptome Analysis of Transcriptome Rose Dieng-Kuntz Rose Dieng-Kuntz 1 , Khaled Khelif , Khaled Khelif 1 , Olivier , Olivier Corby Corby 1 Pascal Barbry Pascal Barbry 2 1 INRIA - Sophia Antipolis INRIA - Sophia Antipolis ACACIA project, ACACIA project, http://www.inria.fr/acacia http://www.inria.fr/acacia http://www.inria.fr/acacia/corese http://www.inria.fr/acacia/corese 2 IPMC, Sophia Antipolis IPMC, Sophia Antipolis http://www.ipmc.fr http://www.ipmc.fr

1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

Embed Size (px)

Citation preview

Page 1: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

1

Semantic Web Technologies for Analysis Semantic Web Technologies for Analysis of Transcriptomeof Transcriptome

Rose Dieng-KuntzRose Dieng-Kuntz11, Khaled Khelif, Khaled Khelif11, Olivier Corby, Olivier Corby11

Pascal Barbry Pascal Barbry22

11INRIA - Sophia Antipolis INRIA - Sophia Antipolis ACACIA project, ACACIA project,

http://www.inria.fr/acaciahttp://www.inria.fr/acaciahttp://www.inria.fr/acacia/coresehttp://www.inria.fr/acacia/corese

22IPMC, Sophia AntipolisIPMC, Sophia Antipolishttp://www.ipmc.frhttp://www.ipmc.fr

Page 2: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

2

OutlineOutline

• Context: Memory of Biochip Experiments

• The MEAT Project

• Semi-automatic generation of semantic annotations

• Conclusions: Requirements for Semantic Web

Page 3: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

3

Context: Biochip experimentsContext: Biochip experiments

• Applications in biology, medicine, pharmacology…:

Gene discovery

Disease diagnosis or prognosis

Drug discovery: Pharmacogenomics

Toxicological research: Toxicogenomics

• DNA microarrays (gene chips, biochips) enable to simultaneously measure the expression level and transcription rate of various genes in an organism.

Page 4: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

4

Towards Biochip Experiment MemoryTowards Biochip Experiment Memory

Need of Knowledge Management for a community of biologists: Biochip Experiment memory

Experiment sheets

Domain OntologiesExperiment

DBDocuments

Biologist

Need of support to validation & interpretation of results of biochip experiments

Page 5: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

5

The MEAT ProjectThe MEAT Project

MEDIANTE

MEAT-Annot&Search MEAT-Miner

MEAT-OntoUMLS,Gene Onto…

Page 6: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

6

Phases: before experimentPhases: before experimentBiologist checks & validates probes available on the biochip

& selects a subset

Order slides in order to launch a new biochip experiment

Submission of journal articles on genes supposed interesting

Constitution of an electronic document corpus

Creation of semantic annotations on these articleswith MEAT-Annot

Page 7: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

7

Phases: after experimentPhases: after experiment

Storage of the experiment description and of its results in MEDIANTE, according to Array Express format

Statistical analysis of results with MEAT-Miner

Interpretation of results, using more bibliographical searches

Addition of new semantic annotations on the experiment

Page 8: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

8

MEAT-Annot&SearchMEAT-Annot&Search

ARRAY-EXPRESS- Experiment description- Result description

Article annotation base

Result annotation base

General knowledge base

BRIGENE: Annotation baseCORESE Search

engine

- MEAT-dedicated Query interface-Result browsing Interface

MEAT-Search

Manual annotation editor

Automatic generation of annotations from a corpus

MEAT-Annot:Annotation Acquisition Tool

Page 9: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

9

MEATAnnot: Technical ChoicesMEATAnnot: Technical Choices

NLP tools : term extractor + relation extractor

Automatic generation of a semantic annotation and representation in RDF

Extraction of relations between them, from texts

Extraction of terms corresponding to UMLS Ontology concepts, from texts

Page 10: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

10

Relationship extractionRelationship extraction

Test corpus

Syntex

• Syntex (Bourigault D. 2000) : Corpus syntactic analyser

• Used to reveal « verb syntagms » usually used in the biochip domain

Page 11: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

11

Relationship extractionRelationship extraction• Choosing potential relationship revealed by Syntex

{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}

{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}

• Writing relationship extraction grammar : using JAPE

Page 12: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

12

Gate API

System architectureSystem architecture

{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "effects"}

{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "effects"}

{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"} |{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"} |{Token.string == "important"} |{Token.string == "critical"} |{Token.string == "some"} |{Token.string == "unexpected"} |{Token.string =="multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}

{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"} |{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"} |{Token.string == "important"} |{Token.string == "critical"} |{Token.string == "some"} |{Token.string == "unexpected"} |{Token.string =="multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}

Documents

----- -- --- ----------- ---- -------------

----- -- --- ----------- ---- -------------

RDF Annotations

MeatAnnot

Biologist

UMLS Knowledge

server

Page 13: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

13

Example

« HGF plays an important role in lung development »« HGF plays an important role in lung development »

The information extracted from this sentence are:

HGF  : an instance of the concept « Amino Acid, Peptide or protein »

lung development  : an instance of the concept « organ or tissue function »

HGF play role lung development : an instance of the relation « play role » between the two terms

Page 14: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

14

RDF Annotation GeneratedRDF Annotation Generated

<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:m='http://www.inria.fr/acacia/meat#' xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'>

<m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'> <m:play_role> <m:Organ_or_Tissue_Function rdf:about='lung_ development#'/> </m:play_role></m:Amino_Acid_Peptide_or_Protein>

</rdf:RDF>

<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:m='http://www.inria.fr/acacia/meat#' xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'>

<m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'> <m:play_role> <m:Organ_or_Tissue_Function rdf:about='lung_ development#'/> </m:play_role></m:Amino_Acid_Peptide_or_Protein>

</rdf:RDF>

Page 15: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

15

CO

RE

SE

CORESE Semantic search engineCORESE Semantic search engine

Ontologies Documents XML

<accident> <date> 19 Mai 2000 </date> <description> <facteur>le facteur </description></accident>

Legacy sys.

Users

<rdfs:Class rdf:ID="thing"/><rdfs:Class rdf:ID="person"> <rdfs:subClassOf rdf:resource="#thing"/></rdfs:Class>

Schema in RDFS

<ns:article rdf:about="http://intranet/articles/ecai.doc"> <ns:title>MAS and Corporate Semantic Web</ns:title> <ns:author> <ns:person rdf:about="http://intranet/employee/id109" /> </ns:author></ns:article>

Annotations in RDF formed by instances of schema in RDFS

qu

ery

answ

er

pu

sh

URI UNICODE

XML NAMESPACES

RDF

RDFS

ONTOLOGY

RULES

Web stack QUERIES RDFS

RDF

Queries

Rules

CG Support

CG Base

CG Query

CG Rules CG Results

PROJECTION

INFERENCES

SemanticWeb server

RDF/S

Page 16: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

16

Annotation BaseUMLS

InterfaceInterface

Biologists

CoreseCorese

Formulate queries

Submit queries

Return results

loadload

Ontology-based queryOntology-based query

Page 17: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

17

Semantic Web requirementsSemantic Web requirements• Adaptation of Corese semantic search engine to OWL

• Corese query language vs SPARQL

• Contextual annotations Need of expression of multiple contexts / viewpoints

• Temporal queries on the past biochip experiment base+ temporally evolving ontologies & annotations

• Scalability of NLP tools: articles stemming from scientific watch on the open (semantic) Web…

Page 18: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

18

Many thanks toMany thanks to

• ACACIA team: ACACIA team: in particular in particular Khaled KhelifKhaled Khelif, Laurent Alamarguy, , Laurent Alamarguy, Olivier CorbyOlivier Corby,, Alain Giboin… Alain Giboin…

• IPMC: Pascal Barbry, Kevin IPMC: Pascal Barbry, Kevin Le Le Brigand, Brigand, Hélène, Hélène, ChimèneChimène, Yves, Yves

• Bayer Crop Science: Rémi BarsBayer Crop Science: Rémi Bars• Didier Bourigault (ERSS), developer of SyntexDidier Bourigault (ERSS), developer of Syntex• The developers of GATE (Sheffield Univ.)The developers of GATE (Sheffield Univ.)

Page 19: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

19

Support to health network

<dossierPatient> <date> 19 Mai 2000 </date> <donneesAdministratives> <Patient><nom>Dupont</nom> <prenom> Michel </prenom> </Patient> </donneesAdministratives>…

Documents (Patient record,Best practices Guide …)

Medical OntologySemantic Annotations

Translator

Nautilus DB

Coresesearch engine Virtual Staff

Life Line

Member ofthe health network

Page 20: 1 Semantic Web Technologies for Analysis of Transcriptome Rose Dieng-Kuntz 1, Khaled Khelif 1, Olivier Corby 1 Pascal Barbry 2 1 INRIA - Sophia Antipolis

20

Visual Staff ArchitectureVisual Staff Architecture