Semantic annotation of biomedical data

1.Semantic annotation of biomedical data
Clement Jonquet
jonquet@stanford.edu
INRIA - EXMO seminar - March 24th, 2010

2. Speech overview

Introduction: semantic annotation, semantic web, biomedical context, the challenge

3. Ontology-based annotation workflow: concept recognition, semantic expansion, why its hard? 4. Annotation services: the NCBO Annotatorweb service, the NCBO biomedical resources index 5. Users & use cases 6. Conclusion and future work2
7. Annotation & semantic web

Part of the vision for the semantic web

8. Web content must be semantically described using ontologies 9. Semantic annotations help to structure the web 10. Annotation is not an easy task 11. Automatic vs. manual 12. Lack of annotation tools (convenient, simple to use and easily integrated into automatic processes) 13. Todays web content (& public data available through the web) mainly composed of unstructured textINRIA - EXMO seminar - March 24th, 2010
3
14. Annotation is not a common practice

High number of ontologies

15. Getting access to all is hard: formats, locations, APIs 16. Lack of tools that easily access all ontologies (domain) 17. Users do not always know the structure of an ontologys content or how to use it in order to do the annotations themselves 18. Lack of tools to do the annotations automatically 19. Boring additional task without immediate reward for the userINRIA - EXMO seminar - March 24th, 2010
4
20. Biomedical context

Explosion of publicly available biomedical data

21. Very diverse, grow very fast 22. Most of the data are unstructured and rarely described with ontology concepts available in the domains 23. Hard for biomedical researchers to find the data they need 24. Data integration problem 25. Translational discoveries are prevented 26. Good example of use of ontologies and terminologies for annotations 27. Gene Ontology annotations 28. PubMed (biomedical literature) indexed with Mesh headings 29. Limitations 30. UMLS only, almost nothing for OBO & OWL ontologies 31. Manual approaches, curators (scalability?) 32. Automatic approaches (usability & accuracy?) INRIA - EXMO seminar - March 24th, 2010
5
33. The challenge

Automatically process a piece of raw text to annotate it with relevant ontologies

34. Large scale to scale up for many resources and ontologies 35. Automatic to keep precision and accuracy 36. Easy to use and to access to prevent the biomedical community from getting lost 37. Customizable to fit very specific needs 38. Smart to leverage the knowledge contained in ontologiesINRIA - EXMO seminar - March 24th, 2010
6
39. Vocabulary

Element = a collection of observations resulting from a biomedical experimentorstudy

40. a dataset, clinical-trial description, research article,imaging study 41. Text metadata=the set of free text that describe or annotate an element 42. Resource = a collection of elements 43. GEO, PubMed, ClinicalTrial.gov, Guideline.gov, ArrayExpress 44. Concept = a unique entity (class) in an specific ontology (has an URI) 45. UMLS CUI or NCBO URI e.g., C0025202, DOID:1909 46. Term = a string that identifies a given concept (name, synonyms) 47. Melanoma, Melanomas, Malignant melanoma 48. Annotation = meta-information on a data: this data deals with this concept 49. PMID17984116 deals with C0025202INRIA - EXMO seminar - March 24th, 2010
7
50. Why using ontologies?
They structure the knowledge from a domain
They specify terms that can be used by natural language processing algorithms to process text
They uniquely identify concept (URI)
They specify relations between concepts that can be used for computing concept similarity
They define hierarchies allowing abstraction of type
They play the role of common denominator for various data froma domain
8
51. Why using ontologies?
9
52. Why is it a hard problem? (1/2)

Identify concept from text is a hard task

53. May involve NLP, stemming, spell-checking, or recognition of morphological variants 54. Concept disambiguation 55. Scalability issues 56. We want to deal with millions of concepts (~4M) 57. 200+ ontologies in several format, spread out 58. Huge biomedical resources e.g., PubMed 17M citations 59. What to do with annotations when the ontologies and the resources evolve over time 60. e.g., elements in resources are added 61. e.g., concepts in ontologies are removed INRIA - EXMO seminar - March 24th, 2010
10
62. Why is it a hard problem? (2/2)
How to leverage the knowledge contained in ontologies?
Process the transitive closure for relations (not trivial for ontologies with 300k concepts)
Execute semantic distance algorithms to determine similarity
Compute mappings between ontologies to connect ontologies one another
Keep all of this up to date when ontologies evolve
e.g., new GO version everyday
11
63. Ontology-based annotation workflow
INRIA - EXMO seminar - March 24th, 2010`
12
First, direct annotations are created by recognizing concepts in raw text,
Second,annotations are semantically expanded using knowledge of the ontologies,
Third, all annotations are scored according to the context in which they have been created.
64. Concept recognition (step 1)

Uses a dictionary: a list of strings that identifies ontology concepts

65. 220 ontologies, ~4.2M concepts & ~7.9M termsUses NCIBI Mgrep, a syntactic concept recognizer
High degree of accuracy
Fast, scalable,
Domain independent
13
66. Semantic expansion (step 2)

Uses is_a hierarchies defined by original ontologies

67. Uses mapping in UMLS Metathesaurus and NCBO BioPortal 68. Usessemantic- similarity algorithms based on the is_a graph (ongoing work) 69. Componentsavailable asweb services14
70. An example

Melanoma is a malignant tumor of melanocytes which are found predominantly in skin but also in the bowel and the eye.

71. NCI/C0025201, Melanocyte in NCI Thesaurus 72. 39228/DOID:1909, Melanoma in Human Disease 73. Is_a closure expansion 74. 39228/DOID:191, Melanocytic neoplasm, direct parent of Melanoma in Human Disease 75. 39228/DOID:0000818, cell proliferation disease, grand parent of Melanoma in Human Disease 76. Mapping expansion 77. FMA/C0025201, Melanocyte in Foundational Model of Anatomy, concept mapped to NCI/C0025201 in UMLS.INRIA - EXMO seminar - March 24th, 2010`
15

Melanoma is a malignant tumor of melanocytes whichare found predominantly in skin but also in the bowel and the eye.

Semantic annotation of biomedical data

Education

RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,

Semantics Visualizations on Mobile Devices · • semantic search on text, audio, video • social annotation • crowd sourcing • semantic audio analysis • search & annotation

NCBO Annotator: Semantic Annotation of Biomedical Datajonquet/publications/documents/Demo-ISWC09-Jonq… · ANNOTATION & SEMANTIC WEB One of the requirements of the semantic web is

NCBO Annotator: Semantic Annotation of Biomedical Data

Lexical Semantics and Semantic Annotation

The use of semantic prototypes in semantic role annotation (pdf)

Biomedical Annotation - Kevin Livingston

Semantic Annotation for Medieval Cartographynetwork.icom.museum/fileadmin/user_upload/minisites/cidoc/Conference... · Semantic Annotation for Medieval Cartography The Example of

Ontologies, semantic annotation and GATE · Ontologies, semantic annotation and GATE ... Format: rdfxml, ... –‘The koala is an animal living in Australia.

Technical Report: Semantic Annotation Platforms

Semantic annotation, clustering and visualization

Soft Computing-based Methods for Semantic Service Retrieval · 2017-10-05 · semantic service annotation, ... computing based methods have been used for semantic service annotation

FIOBODA - SEMANTIC ANNOTATION FRAMEWORK FOR WEB … · 2016-11-17 · semantic annotation method, “C” stands for the context of the annotation in which the annotat ... documents

Integrating Syntactic and Semantic Annotation of Biomedical Text

Biomedical Literature Mining for Biological Databases Annotation

Semantic Annotation of Mutable Data · Semantic Annotation of Mutable Data Robert A. Morris1,2*, Lei Dou3, James Hanken4, ... Electronic annotation of scientific data is very similar

8. Semantic Annotation

COLLABORATIVE BIOMEDICAL ANNOTATION AS ASERVICE+ · COLLABORATIVE+BIOMEDICAL+ANNOTATION+ AS+A+SERVICE+ David+Campos1,JoniLourenço1,+Tiago+Nunes1,Rui+Vitorino2,PedroDomingues2,SérgioMatos1,JoséLuís+

Semantic Annotation in SALSA

Annotation and Navigation in Semantic Wikis