View
1.758
Download
2
Category
Tags:
Preview:
DESCRIPTION
Presentation about semantic annotation of biomedical data. Presented at LIRMM, INRIA and other between 2008 and 2010.
Citation preview
2. Speech overview
3. Ontology-based annotation workflow: concept recognition,
semantic expansion, why its hard? 4. Annotation services: the NCBO
Annotatorweb service, the NCBO biomedical resources index 5. Users
& use cases 6. Conclusion and future work2
INRIA - EXMO seminar - March 24th, 2010
7. Annotation & semantic web
8. Web content must be semantically described using ontologies
9. Semantic annotations help to structure the web 10. Annotation is
not an easy task 11. Automatic vs. manual 12. Lack of annotation
tools (convenient, simple to use and easily integrated into
automatic processes) 13. Todays web content (& public data
available through the web) mainly composed of unstructured
textINRIA - EXMO seminar - March 24th, 2010
3
14. Annotation is not a common practice
15. Getting access to all is hard: formats, locations, APIs 16.
Lack of tools that easily access all ontologies (domain) 17. Users
do not always know the structure of an ontologys content or how to
use it in order to do the annotations themselves 18. Lack of tools
to do the annotations automatically 19. Boring additional task
without immediate reward for the userINRIA - EXMO seminar - March
24th, 2010
4
20. Biomedical context
21. Very diverse, grow very fast 22. Most of the data are
unstructured and rarely described with ontology concepts available
in the domains 23. Hard for biomedical researchers to find the data
they need 24. Data integration problem 25. Translational
discoveries are prevented 26. Good example of use of ontologies and
terminologies for annotations 27. Gene Ontology annotations 28.
PubMed (biomedical literature) indexed with Mesh headings 29.
Limitations 30. UMLS only, almost nothing for OBO & OWL
ontologies 31. Manual approaches, curators (scalability?) 32.
Automatic approaches (usability & accuracy?) INRIA - EXMO
seminar - March 24th, 2010
5
33. The challenge
34. Large scale to scale up for many resources and ontologies
35. Automatic to keep precision and accuracy 36. Easy to use and to
access to prevent the biomedical community from getting lost 37.
Customizable to fit very specific needs 38. Smart to leverage the
knowledge contained in ontologiesINRIA - EXMO seminar - March 24th,
2010
6
39. Vocabulary
40. a dataset, clinical-trial description, research
article,imaging study 41. Text metadata=the set of free text that
describe or annotate an element 42. Resource = a collection of
elements 43. GEO, PubMed, ClinicalTrial.gov, Guideline.gov,
ArrayExpress 44. Concept = a unique entity (class) in an specific
ontology (has an URI) 45. UMLS CUI or NCBO URI e.g., C0025202,
DOID:1909 46. Term = a string that identifies a given concept
(name, synonyms) 47. Melanoma, Melanomas, Malignant melanoma 48.
Annotation = meta-information on a data: this data deals with this
concept 49. PMID17984116 deals with C0025202INRIA - EXMO seminar -
March 24th, 2010
7
50. Why using ontologies?
They structure the knowledge from a domain
They specify terms that can be used by natural language processing
algorithms to process text
They uniquely identify concept (URI)
They specify relations between concepts that can be used for
computing concept similarity
They define hierarchies allowing abstraction of type
They play the role of common denominator for various data froma
domain
INRIA - EXMO seminar - March 24th, 2010
8
51. Why using ontologies?
9
INRIA - EXMO seminar - March 24th, 2010
52. Why is it a hard problem? (1/2)
53. May involve NLP, stemming, spell-checking, or recognition of
morphological variants 54. Concept disambiguation 55. Scalability
issues 56. We want to deal with millions of concepts (~4M) 57. 200+
ontologies in several format, spread out 58. Huge biomedical
resources e.g., PubMed 17M citations 59. What to do with
annotations when the ontologies and the resources evolve over time
60. e.g., elements in resources are added 61. e.g., concepts in
ontologies are removed INRIA - EXMO seminar - March 24th,
2010
10
62. Why is it a hard problem? (2/2)
How to leverage the knowledge contained in ontologies?
Process the transitive closure for relations (not trivial for
ontologies with 300k concepts)
Execute semantic distance algorithms to determine similarity
Compute mappings between ontologies to connect ontologies one
another
Keep all of this up to date when ontologies evolve
e.g., new GO version everyday
INRIA - EXMO seminar - March 24th, 2010
11
63. Ontology-based annotation workflow
INRIA - EXMO seminar - March 24th, 2010`
12
First, direct annotations are created by recognizing concepts in
raw text,
Second,annotations are semantically expanded using knowledge of the
ontologies,
Third, all annotations are scored according to the context in which
they have been created.
64. Concept recognition (step 1)
65. 220 ontologies, ~4.2M concepts & ~7.9M termsUses NCIBI
Mgrep, a syntactic concept recognizer
High degree of accuracy
Fast, scalable,
Domain independent
13
INRIA - EXMO seminar - March 24th, 2010`
66. Semantic expansion (step 2)
67. Uses mapping in UMLS Metathesaurus and NCBO BioPortal 68.
Usessemantic- similarity algorithms based on the is_a graph
(ongoing work) 69. Componentsavailable asweb services14
INRIA - EXMO seminar - March 24th, 2010`
70. An example
71. NCI/C0025201, Melanocyte in NCI Thesaurus 72.
39228/DOID:1909, Melanoma in Human Disease 73. Is_a closure
expansion 74. 39228/DOID:191, Melanocytic neoplasm, direct parent
of Melanoma in Human Disease 75. 39228/DOID:0000818, cell
proliferation disease, grand parent of Melanoma in Human Disease
76. Mapping expansion 77. FMA/C0025201, Melanocyte in Foundational
Model of Anatomy, concept mapped to NCI/C0025201 in UMLS.INRIA -
EXMO seminar - March 24th, 2010`
15
Recommended