Upload
shavonne-rose
View
212
Download
0
Embed Size (px)
Citation preview
1
Semantic Web Technologies for Analysis Semantic Web Technologies for Analysis of Transcriptomeof Transcriptome
Rose Dieng-KuntzRose Dieng-Kuntz11, Khaled Khelif, Khaled Khelif11, Olivier Corby, Olivier Corby11
Pascal Barbry Pascal Barbry22
11INRIA - Sophia Antipolis INRIA - Sophia Antipolis ACACIA project, ACACIA project,
http://www.inria.fr/acaciahttp://www.inria.fr/acaciahttp://www.inria.fr/acacia/coresehttp://www.inria.fr/acacia/corese
22IPMC, Sophia AntipolisIPMC, Sophia Antipolishttp://www.ipmc.frhttp://www.ipmc.fr
2
OutlineOutline
• Context: Memory of Biochip Experiments
• The MEAT Project
• Semi-automatic generation of semantic annotations
• Conclusions: Requirements for Semantic Web
3
Context: Biochip experimentsContext: Biochip experiments
• Applications in biology, medicine, pharmacology…:
Gene discovery
Disease diagnosis or prognosis
Drug discovery: Pharmacogenomics
Toxicological research: Toxicogenomics
• DNA microarrays (gene chips, biochips) enable to simultaneously measure the expression level and transcription rate of various genes in an organism.
4
Towards Biochip Experiment MemoryTowards Biochip Experiment Memory
Need of Knowledge Management for a community of biologists: Biochip Experiment memory
Experiment sheets
Domain OntologiesExperiment
DBDocuments
Biologist
Need of support to validation & interpretation of results of biochip experiments
5
The MEAT ProjectThe MEAT Project
MEDIANTE
MEAT-Annot&Search MEAT-Miner
MEAT-OntoUMLS,Gene Onto…
6
Phases: before experimentPhases: before experimentBiologist checks & validates probes available on the biochip
& selects a subset
Order slides in order to launch a new biochip experiment
Submission of journal articles on genes supposed interesting
Constitution of an electronic document corpus
Creation of semantic annotations on these articleswith MEAT-Annot
7
Phases: after experimentPhases: after experiment
Storage of the experiment description and of its results in MEDIANTE, according to Array Express format
Statistical analysis of results with MEAT-Miner
Interpretation of results, using more bibliographical searches
Addition of new semantic annotations on the experiment
8
MEAT-Annot&SearchMEAT-Annot&Search
ARRAY-EXPRESS- Experiment description- Result description
Article annotation base
Result annotation base
General knowledge base
BRIGENE: Annotation baseCORESE Search
engine
- MEAT-dedicated Query interface-Result browsing Interface
MEAT-Search
Manual annotation editor
Automatic generation of annotations from a corpus
MEAT-Annot:Annotation Acquisition Tool
9
MEATAnnot: Technical ChoicesMEATAnnot: Technical Choices
NLP tools : term extractor + relation extractor
Automatic generation of a semantic annotation and representation in RDF
Extraction of relations between them, from texts
Extraction of terms corresponding to UMLS Ontology concepts, from texts
10
Relationship extractionRelationship extraction
Test corpus
Syntex
• Syntex (Bourigault D. 2000) : Corpus syntactic analyser
• Used to reveal « verb syntagms » usually used in the biochip domain
11
Relationship extractionRelationship extraction• Choosing potential relationship revealed by Syntex
{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}
{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}
• Writing relationship extraction grammar : using JAPE
12
Gate API
System architectureSystem architecture
{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "effects"}
{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"}|{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"}|{Token.string == "important"}|{Token.string == "critical"}|{Token.string == "some"} |{Token.string == "unexpected"}|{Token.string == "multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "effects"}
{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"} |{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"} |{Token.string == "important"} |{Token.string == "critical"} |{Token.string == "some"} |{Token.string == "unexpected"} |{Token.string =="multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}
{Tag.lemme == "play"}{SpaceToken}({Token.string == "a"} |{Token.string == "an"})?({SpaceToken})?({Token.string == "vital"} |{Token.string == "important"} |{Token.string == "critical"} |{Token.string == "some"} |{Token.string == "unexpected"} |{Token.string =="multifaceted"} |{Token.string == "major"})?({SpaceToken})?{Tag.lemme == "role"}
Documents
----- -- --- ----------- ---- -------------
----- -- --- ----------- ---- -------------
RDF Annotations
MeatAnnot
Biologist
UMLS Knowledge
server
13
Example
« HGF plays an important role in lung development »« HGF plays an important role in lung development »
The information extracted from this sentence are:
HGF : an instance of the concept « Amino Acid, Peptide or protein »
lung development : an instance of the concept « organ or tissue function »
HGF play role lung development : an instance of the relation « play role » between the two terms
14
RDF Annotation GeneratedRDF Annotation Generated
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:m='http://www.inria.fr/acacia/meat#' xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'>
<m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'> <m:play_role> <m:Organ_or_Tissue_Function rdf:about='lung_ development#'/> </m:play_role></m:Amino_Acid_Peptide_or_Protein>
</rdf:RDF>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:m='http://www.inria.fr/acacia/meat#' xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'>
<m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'> <m:play_role> <m:Organ_or_Tissue_Function rdf:about='lung_ development#'/> </m:play_role></m:Amino_Acid_Peptide_or_Protein>
</rdf:RDF>
15
CO
RE
SE
CORESE Semantic search engineCORESE Semantic search engine
Ontologies Documents XML
<accident> <date> 19 Mai 2000 </date> <description> <facteur>le facteur </description></accident>
Legacy sys.
Users
<rdfs:Class rdf:ID="thing"/><rdfs:Class rdf:ID="person"> <rdfs:subClassOf rdf:resource="#thing"/></rdfs:Class>
Schema in RDFS
<ns:article rdf:about="http://intranet/articles/ecai.doc"> <ns:title>MAS and Corporate Semantic Web</ns:title> <ns:author> <ns:person rdf:about="http://intranet/employee/id109" /> </ns:author></ns:article>
Annotations in RDF formed by instances of schema in RDFS
qu
ery
answ
er
pu
sh
URI UNICODE
XML NAMESPACES
RDF
RDFS
ONTOLOGY
RULES
Web stack QUERIES RDFS
RDF
Queries
Rules
CG Support
CG Base
CG Query
CG Rules CG Results
PROJECTION
INFERENCES
SemanticWeb server
RDF/S
16
Annotation BaseUMLS
InterfaceInterface
Biologists
CoreseCorese
Formulate queries
Submit queries
Return results
loadload
Ontology-based queryOntology-based query
17
Semantic Web requirementsSemantic Web requirements• Adaptation of Corese semantic search engine to OWL
• Corese query language vs SPARQL
• Contextual annotations Need of expression of multiple contexts / viewpoints
• Temporal queries on the past biochip experiment base+ temporally evolving ontologies & annotations
• Scalability of NLP tools: articles stemming from scientific watch on the open (semantic) Web…
18
Many thanks toMany thanks to
• ACACIA team: ACACIA team: in particular in particular Khaled KhelifKhaled Khelif, Laurent Alamarguy, , Laurent Alamarguy, Olivier CorbyOlivier Corby,, Alain Giboin… Alain Giboin…
• IPMC: Pascal Barbry, Kevin IPMC: Pascal Barbry, Kevin Le Le Brigand, Brigand, Hélène, Hélène, ChimèneChimène, Yves, Yves
• Bayer Crop Science: Rémi BarsBayer Crop Science: Rémi Bars• Didier Bourigault (ERSS), developer of SyntexDidier Bourigault (ERSS), developer of Syntex• The developers of GATE (Sheffield Univ.)The developers of GATE (Sheffield Univ.)
19
Support to health network
<dossierPatient> <date> 19 Mai 2000 </date> <donneesAdministratives> <Patient><nom>Dupont</nom> <prenom> Michel </prenom> </Patient> </donneesAdministratives>…
Documents (Patient record,Best practices Guide …)
Medical OntologySemantic Annotations
Translator
Nautilus DB
Coresesearch engine Virtual Staff
Life Line
Member ofthe health network
20
Visual Staff ArchitectureVisual Staff Architecture