Upload
abigayle-jacobs
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
The Ultralink – an expert system for contextual hyperlinking in knowledge management
Manuel C. Peitsch
Head of Systems BiologyNovartis Institutes for Biomedical Research
2 Ultralink/ M. Peitsch/ Bio-IT World April 2006
A Knowledge Space
0.5 1.50.5 1.50.5 1.5
3 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Connecting the Knowledge Bodies (requirements)
Intelligent integration of heterogeneous data to enable “Seamless Navigation”:
One-stop shop.
Re-useable, in any Web and Office application.
Intelligent, i.e. knows about biology, medicine, chemistry, diseases, business, people, etc…
On demand and easy to use.
Configurable.
4 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Connecting the Knowledge Bodies (Components)
Indexing of large heterogeneous data collections (databases, full texts) to enable semantic expansion.
Information Retrieval and Extraction, entity recognition, semantic enrichment.
Knowledge Map (navigating the conceptual network).
Terminology Hub (thesauri and ontologies).
Ontology-associated business rules.
5 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Creating references (Terminology Hub)
Different knowledge repositories have different ways to encode a concept:
Registry Number
Unique Internal ID
Concept Identifier
Enumerating terms
Just using different terms without any constraintsSearching a term in source A and B may lead to different
results although the underlying concepts exists in both sources (false negatives in IR and IE)
Terminology Hub ensures coherent mapping
Between coding systems
Between different representation levels (e.g. ID vs. Concept)
Between local terms and global terms
Over 8 GB of cross-referencing information
6 Ultralink/ M. Peitsch/ Bio-IT World April 2006
What entities constitute our Terminology?
Chemical entities – IUPAC names, trivial names, trade names, INNs, compound codes, ligands.
Biological entities – targets, genes/protein, modes of actions…
Diseases, Indications, Side Effects, Contraindications
Institutions, Affiliations, People
Geographic locations
…
7 Ultralink/ M. Peitsch/ Bio-IT World April 2006
The UltraLink : a revolutionary tool to navigate the “Knowledge Space”
1. Zoning This process uses our (meta-) knowledge about information
structure, and tags the relevant contexts of the documents or database records.
2. Identification of terms based on the terminology or on regular expression
Term Identification: identify the lexical items in a text, relate them to a term and retrieve the corresponding reference term via thesaurus relations.
Concept Identification: identify the concept related to the reference term(s).
Type Assignment: Assign the concept type related to a concept identifier
3. Extraction and normalization
4. Get list of rules to apply Verifiers Application of rules
5. Display
8 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Creating the Ultralink
9 Ultralink/ M. Peitsch/ Bio-IT World April 2006
UltraLink Examples
Treatment of ambiguities
WILMs TUMORDISEASE Wilms' tumor => nephroblastomaGENE NAME WT1TARGET Wilms' tumor
10 Ultralink/ M. Peitsch/ Bio-IT World April 2006
UltraLink Examples
Term extraction / Normalization -> Examples (mtor, mammalian target of rapamycin)
11 Ultralink/ M. Peitsch/ Bio-IT World April 2006
UltraLink Examples
Term to UltraLink:
granulocyte - macrophage colony stimulating factor
Concept Type: TARGET
Normalized term (non exhaustive):
Granulocyte-macrophage colony-stimulating factor
Synonyms:
colony stimulating factor 2
Colony-stimulating factor, CSF ,GCSF, GM-CSF
Granulocyte macrophage colony stimulating factor
Molgramostin, Sargramostim
Local terms (non exhaustive list of examples):
EMBL e.g. AC004511, AF373868, …
Pubmed e.g. 1569568, 1737041, …
NCBI e.g. 10090, 10116, 9606
GO e.g. GO:000512, GO:0019221, …
UniProt e.g. P01587, …
12 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Link to Pathway information from TCF7L2
13 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Further query, Genome/Proteome Information
14 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Access to tools and reagents
15 Ultralink/ M. Peitsch/ Bio-IT World April 2006
MetaCore Map containing FZD4 (frizzled 4)
Proteins where
antibodies are available
are marked with
an additional icon
Mouse-over
shows
specificity
Hyperlink
to
Antibodies
Web Report
16 Ultralink/ M. Peitsch/ Bio-IT World April 2006
UltraLink Examples
17 Ultralink/ M. Peitsch/ Bio-IT World April 2006
UltraLink Examples
18 Ultralink/ M. Peitsch/ Bio-IT World April 2006
The Ultralink can be call from the Internet Explorer
Internet Explorer IntegrationGPS Add-in
Internet Explorer IntegrationGPS Add-in
Web Page Tagged Document
2
Sends the document for
analysis
3
Gets back tagged parts
1
User requests for analysis
4
Injection of specific HTML
tags
Web
Serv
ice (
WS
DL)
Web
Serv
ice (
WS
DL)
GPS Lexical Analysis Server ToolsGPS Lexical Analysis Server Tools
TerminologyTerminology
Lexical ExtractionLexical Extraction
ZoningZoning
TaggingTagging
DocStructuresDocStructures
Meta-RulesMeta-Rules
19 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Activation UltraLink
Annotations on any web page
20 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Web
Serv
ice (
WS
DL)
Web
Serv
ice (
WS
DL)
The Ultralink is integrated with Microsoft Office
Microsoft Smart TagExtraction
Microsoft Smart TagExtraction
Office Document
Tagged Document
2
Sends the document for
analysis
3
Gets back tagged parts
1
User requests for analysis
GPS Lexical Analysis Server ToolsGPS Lexical Analysis Server Tools
TerminologyTerminology
Lexical ExtractionLexical Extraction
ZoningZoning
TaggingTagging
DocStructuresDocStructures
Meta-RulesMeta-Rules
21 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Annotations on full text in a Word document
22 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Ultralinks can be coded into any application
23 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Acknowledgements
Thérèse Vachon
Martin Romacker
Pierre Parisot
Nicolas Grandjean
Brigitte Charpiot
Jean-Marc von Allmen
Daniel Cronenberger
Olivier Kreim
24 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Knowledge Space and GPS Navigator
Backup slides
25 Ultralink/ M. Peitsch/ Bio-IT World April 2006
What constitutes the Knowledge Space
Ultralinker
SemanticSearch
Text Mining
Analytics
Internet
Other ResearchDocumentation
Chemistry
Biology
Literature
Comp. Inf.
Bioinformatics
Meta Data K map
ThesauriiOntologies
Rules
Definedworkflows
26 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Data Analysis – Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects
Univariate - Companies Univariate - MOA
Univariate - Diseases conditionned by Companies Clustering Diseases -MOAs
27 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Graph Navigator – Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects
28 Ultralink/ M. Peitsch/ Bio-IT World April 2006
Clustering