Upload
joncr
View
160
Download
2
Tags:
Embed Size (px)
Citation preview
CTSAconnect
Reveal Connections. Realize Potential.
The VIVO Ontology and Integrated Seman3c Framework
VIVO Implementa3on Fest April 25, 2013
Jon Corson-‐Rikert, Nicholas Rejack, and Carlo Torniai
CTSAconnect
Reveal Connections. Realize Potential.
Talk Overview
• What is an ontology? Why use one? • Ontology mechanics • Evolu3on of the VIVO ontology • Principles and design paOerns • Core vs. local extensions • VIVO 1.6 and the Integrated Seman3c Framework
CTSAconnect
Reveal Connections. Realize Potential.
What is an ontology?
• A representa3on, • in both computer and human interpretable forms,
• of en33es and rela3ons • comprising a part of reality (a prac3cal defini3on developed by the CTSA Ontology Affinity Group, February, 2013)
CTSAconnect
Reveal Connections. Realize Potential.
Why use an ontology?
• Ontologies enable data interoperability independently of any one applica3on
• For VIVO and eagle-‐i, the ontology drives the applica3on – The ontology provides a logical data model – Addi3onal proper3es (or separate ontologies) configure applica3on behavior
– Vitro = VIVO soZware where you build or import the ontology
CTSAconnect
Reveal Connections. Realize Potential.
Ontology specifica3on hOp://www.w3.org/TR/2009/WD-‐owl2-‐primer-‐20090421/
• OWL 2 is a knowledge representa3on language, designed to formulate, exchange and reason with knowledge about a domain of interest
• OWL 2 denotes objects as individuals, categories as classes and rela3ons as proper.es
• Object proper.es relate objects to objects (like a person to their spouse)
• Datatype proper.es assign data values to objects (like an age to a person)
CTSAconnect
Reveal Connections. Realize Potential.
Popula3ng an ontology
• The ontologies used in VIVO and eagle-‐i are intended for popula3on
• RDF data – Instances of classes (individuals) – Property statements assigning data values to and rela3ng these instances
– Individuals, classes, and proper.es all have URIs so they can be directly addressable on the Web
CTSAconnect
Reveal Connections. Realize Potential.
Basic principles of linked data
• Use URIs as names for things • Use HTTP URIs so people can look up those names
• Provide useful informa3on from a URI in a standard format
• Include links to other URIs
hOp://www.w3.org/DesignIssues/LinkedData.html
CTSAconnect
Reveal Connections. Realize Potential.
Core vs. local extensions
• The VIVO core ontology needs to be consistent to support linked data and search
• Extensions are best confined to subclasses or subproper3es that “roll up” into VIVO core
• Or they may represent informa3on of purely local value (e.g., local iden3fiers)
• In prac3ce, local extensions some3mes lead to addi3ons to core
CTSAconnect
Reveal Connections. Realize Potential.
Evolu3on of the VIVO ontology • VIVO used a rela3onal database structure emula3ng the AKT
ontology from the UK before 2007 – hOp://www.aktors.org/publica3ons/ontology/
• VIVO converted to OWL and RDF in 2007 • The NIH project mo3vated a fresh start in 2009 that drew
extensively from other ontologies (notably BIBO and FOAF) • In 2010 and 2011, VIVO collaborated with eagle-‐i to align
under the BFO upper ontology and add classes to represent scien3fic resources
• The 2012-‐2013 CTSAconnect project completes this transi3on
CTSAconnect
Reveal Connections. Realize Potential.
CTSAconnect and the ISF
• VIVO and eagle-‐i team members won NIH funding in 2012 for a project to unify their ontologies and extend both in the clinical domain
• The unified ontology is known as the Integrated Seman3c Framework, or ISF
• VIVO 1.6 and eagle-‐i’s next release will use the ISF • This combined ontology is modular to allow selec3ve data popula3on based on local needs
CTSAconnect
Reveal Connections. Realize Potential.
4/24/13 14 www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Rela3ng researchers across disciplines
CTSAconnect
Reveal Connections. Realize Potential.
VIVO and ISF principles
• Reuse exis3ng ontologies, in whole or in part • Leverage the structure of an upper-‐level ontology to provide consistency as ontologies are extended
• Ontologies addressing different domains should be self-‐contained and ‘orthogonal’ – capable of being used on their own or linking together without redundant overlap
• Develop ontologies in modules for selec3ve adop3on
CTSAconnect
Reveal Connections. Realize Potential.
Other good prac3ces
• Represent what exists in the world and what you know about it – Ontology realism
• Model an ontology on the data you have in hand, not detail you might someday get
• Test your ontology with real data • Avoid confounding the logical data ontology with applica3on-‐specific requirements
CTSAconnect
Reveal Connections. Realize Potential.
The ontology and the app
• It’s temp3ng to change the ontology to improve applica3on behavior – E.g., to limit author pick lists to members of the class Person
• eagle-‐i uses annota3on proper3es to control applica3on and search behavior
• VIVO 1.6 will implement an applica3on configura3on ontology – The UI may not be mature
CTSAconnect
Reveal Connections. Realize Potential.
VIVO and ISF design paOerns
• Alignment under the Basic Formal Ontology (BFO) – Fundamental division between con3nuants and occurrents
– Useful discipline around rela3onships, roles, and processes
• Heavy reliance on reified rela3onships – Rela3onships that have their own aOributes frequently including temporal bounds
CTSAconnect
Reveal Connections. Realize Potential.
Ontologies in the LOD context
• Closer alignment with exis3ng widely used ontologies – E.g., VCard – W3C Org ontology
• Op3on for popula3ng shortcut rela3ons across VIVO context nodes
• Iden3fier crosswalks
CTSAconnect
Reveal Connections. Realize Potential.
The unknown author problem • VIVO connects a publica3on with an author via an authorship rela3onship – Provides a way to store author rank – Provides an “authorAsListed” property to record the exact format of the name
• We have recommended crea3ng linked foaf:Person records for each author – This allows storage of name parts and/or affilia3on without duplica3ng proper3es
– But it adds a large number of unknown person records, and implies you know more than you do
CTSAconnect
Reveal Connections. Realize Potential.
Internal thing and unknown persons
• In browsing VIVO, users expect people and organiza3ons local to their ins3tu3on
• Similar expecta3ons are voiced for searching • VIVO has mechanisms to privilege internal organiza3ons and people and hide others, but there must be a beOer way
• The ISF expands the vivo:Address into a VCard object that will be a useful alterna3ve to foaf:Person for unknown authors
CTSAconnect
Reveal Connections. Realize Potential.
VCards for disambigua3on • VCards will be useful to represent name variants in the combina3ons actually appearing, together with affilia3on informa3on or an email address
• Dis3nguishing Vcards from Persons will prevent confusion and reduce false matches
• Allows separate processing to look at the universe of VCards with respect to the universe of known persons – In one VIVO, but poten3ally across many or with reference to ORCID
CTSAconnect
Reveal Connections. Realize Potential.
Internal vs. external vocabularies
• Very early recogni3on that VIVO should not import large controlled vocabularies – They vary by domain – They change – In many cases they have stable URIs (LOC, Agrovoc, NALT, Gemet)
• With 1.4 VIVO added lookup services developed in concert with Stony Brook – UMLS (hosted by Stony Brook) and GEMET – VIVO stores the remote concept with its external URI
CTSAconnect
Reveal Connections. Realize Potential.
Expanding VIVO lookup services
• Addi3onal vocabularies – LCSH, Agrovoc, NAL Thesaurus, … – Developing a service template for vocabularies to adopt?
• Authority services – People (orcid.org) – Organiza3ons (viaf.org) – Events?
• Leveraging vivosearch.org – Linking from one VIVO to another
CTSAconnect
Reveal Connections. Realize Potential.
Special areas of focus at the I-‐Fest
• Knowledge mobiliza3on • Represen3ng the humani3es • Linking people and – Publica3ons – Grants – Facili3es and other research resources – Datasets
CTSAconnect
Reveal Connections. Realize Potential.
www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
CTSAconnect Team
CTSA 10-‐001: 100928SB23 PROJECT #: 00921-‐0001
OHSU: Melissa Haendel, Carlo Torniai, Nicole Vasilevsky, Shahim Essaid, Eric Orwoll Cornell University: Jon Corson-‐Rikert, Dean KraK, Brian Lowe University of Florida: Mike Conlon, Chris Barnes, Nicholas Rejack
Stony Brook University: Moises Eisenberg, Erich Bremer, Janos Hajagos Harvard University: Daniela Bourges-‐Waldegg Sophia Cheng Share Center: Chris Kelleher, Will CorbeV, Ranjit Das, Ben Sharma University at Buffalo: Barry Smith, Dagobert Soergel
CTSAconnect project ctsaconnect.org The clinical module source: hVp://bit.ly/clinical-‐isf
CTSAconnect ontology sourcehVp://code.google.com/p/connect-‐isf/
Resources