29
CTSAconnect Reveal Connections. Realize Potential. The VIVO Ontology and Integrated Seman3c Framework VIVO Implementa3on Fest April 25, 2013 Jon CorsonRikert, Nicholas Rejack, and Carlo Torniai

Vivo ontology overviewanddirections.2013-04-25

  • Upload
    joncr

  • View
    160

  • Download
    2

Embed Size (px)

Citation preview

CTSAconnect

Reveal Connections. Realize Potential.  

   

The  VIVO  Ontology  and  Integrated  Seman3c  Framework  

VIVO  Implementa3on  Fest  April  25,  2013  

Jon  Corson-­‐Rikert,  Nicholas  Rejack,  and  Carlo  Torniai  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Talk  Overview  

•  What  is  an  ontology?  Why  use  one?  •  Ontology  mechanics  •  Evolu3on  of  the  VIVO  ontology  •  Principles  and  design  paOerns  •  Core  vs.  local  extensions  •  VIVO  1.6  and  the  Integrated  Seman3c  Framework  

CTSAconnect

Reveal Connections. Realize Potential.  

   

What  is  an  ontology?  

•  A  representa3on,    •  in  both  computer  and  human  interpretable  forms,    

•  of  en33es  and  rela3ons  •  comprising  a  part  of  reality    (a  prac3cal  defini3on  developed  by  the  CTSA  Ontology  Affinity  Group,  February,  2013)  

 

CTSAconnect

Reveal Connections. Realize Potential.  

   

Why  use  an  ontology?  

•  Ontologies  enable  data  interoperability  independently  of  any  one  applica3on  

•  For  VIVO  and  eagle-­‐i,  the  ontology  drives  the  applica3on  – The  ontology  provides  a  logical  data  model  – Addi3onal  proper3es  (or  separate  ontologies)  configure  applica3on  behavior  

– Vitro  =  VIVO  soZware  where  you  build  or  import  the  ontology  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Ontology  specifica3on  hOp://www.w3.org/TR/2009/WD-­‐owl2-­‐primer-­‐20090421/  

 •  OWL  2  is  a  knowledge  representa3on  language,  designed  to  formulate,  exchange  and  reason  with  knowledge  about  a  domain  of  interest  

•  OWL  2  denotes  objects  as  individuals,  categories  as  classes  and  rela3ons  as  proper.es    

•  Object  proper.es  relate  objects  to  objects  (like  a  person  to  their  spouse)  

•  Datatype  proper.es  assign  data  values  to  objects  (like  an  age  to  a  person)  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Popula3ng  an  ontology  

•  The  ontologies  used  in  VIVO  and  eagle-­‐i  are  intended  for  popula3on  

•  RDF  data  –  Instances  of  classes  (individuals)  – Property  statements  assigning  data  values  to  and  rela3ng  these  instances  

–  Individuals,  classes,  and  proper.es  all  have  URIs  so  they  can  be  directly  addressable  on  the  Web  

 

CTSAconnect

Reveal Connections. Realize Potential.  

   

Basic  principles  of  linked  data  

•  Use  URIs  as  names  for  things  •  Use  HTTP  URIs  so  people  can  look  up  those  names  

•  Provide  useful  informa3on  from  a  URI  in  a  standard  format  

•  Include  links  to  other  URIs  

hOp://www.w3.org/DesignIssues/LinkedData.html  

CTSAconnect

Reveal Connections. Realize Potential.  

   

RDF  “triples”  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Rela3onships  create  networks  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Core  vs.  local  extensions  

•  The  VIVO  core  ontology  needs  to  be  consistent  to  support  linked  data  and  search  

•  Extensions  are  best  confined  to  subclasses  or  subproper3es  that  “roll  up”  into  VIVO  core  

•  Or  they  may  represent  informa3on  of  purely  local  value  (e.g.,  local  iden3fiers)  

•  In  prac3ce,  local  extensions  some3mes  lead  to  addi3ons  to  core  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Evolu3on  of  the  VIVO  ontology  •  VIVO  used  a  rela3onal  database  structure  emula3ng  the  AKT  

ontology  from  the  UK  before  2007  –  hOp://www.aktors.org/publica3ons/ontology/    

•  VIVO  converted  to  OWL  and  RDF  in  2007  •  The  NIH  project  mo3vated  a  fresh  start  in  2009  that  drew  

extensively  from  other  ontologies  (notably  BIBO  and  FOAF)  •  In  2010  and  2011,  VIVO  collaborated  with  eagle-­‐i  to  align  

under  the  BFO  upper  ontology  and  add  classes  to  represent  scien3fic  resources  

•  The  2012-­‐2013  CTSAconnect  project  completes  this  transi3on  

CTSAconnect

Reveal Connections. Realize Potential.  

   

CTSAconnect  and  the  ISF  

•  VIVO  and  eagle-­‐i  team  members  won  NIH  funding  in  2012  for  a  project  to  unify  their  ontologies  and  extend  both  in  the  clinical  domain  

•  The  unified  ontology  is  known  as  the  Integrated  Seman3c  Framework,  or  ISF  

•  VIVO  1.6  and  eagle-­‐i’s  next  release  will  use  the  ISF  •  This  combined  ontology  is  modular  to  allow  selec3ve  data  popula3on  based  on  local  needs  

At  the  intersec3on  of  Vivo  and  eagle-­‐i  

CTSAconnect

Reveal Connections. Realize Potential.  

   

4/24/13   14  www.ctsaconnect.org CTSAconnect

Reveal Connections. Realize Potential.  

   

Rela3ng  researchers  across  disciplines  

CTSAconnect

Reveal Connections. Realize Potential.  

   

VIVO  and  ISF  principles  

•  Reuse  exis3ng  ontologies,  in  whole  or  in  part  •  Leverage  the  structure  of  an  upper-­‐level  ontology  to  provide  consistency  as  ontologies  are  extended  

•  Ontologies  addressing  different  domains  should  be  self-­‐contained  and  ‘orthogonal’  –  capable  of  being  used  on  their  own  or  linking  together  without  redundant  overlap  

•  Develop  ontologies  in  modules  for  selec3ve  adop3on  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Other  good  prac3ces  

•  Represent  what  exists  in  the  world  and  what  you  know  about  it  – Ontology  realism  

•  Model  an  ontology  on  the  data  you  have  in  hand,  not  detail  you  might  someday  get  

•  Test  your  ontology  with  real  data  •  Avoid  confounding  the  logical  data  ontology  with  applica3on-­‐specific  requirements  

CTSAconnect

Reveal Connections. Realize Potential.  

   

The  ontology  and  the  app  

•  It’s  temp3ng  to  change  the  ontology  to  improve  applica3on  behavior  – E.g.,  to  limit  author  pick  lists  to  members  of  the  class  Person  

•  eagle-­‐i  uses  annota3on  proper3es  to  control  applica3on  and  search  behavior  

•  VIVO  1.6  will  implement  an  applica3on  configura3on  ontology  – The  UI  may  not  be  mature  

CTSAconnect

Reveal Connections. Realize Potential.  

   

VIVO  and  ISF  design  paOerns  

•  Alignment  under  the  Basic  Formal  Ontology  (BFO)  – Fundamental  division  between  con3nuants  and  occurrents  

– Useful  discipline  around  rela3onships,  roles,  and  processes  

•  Heavy  reliance  on  reified  rela3onships  – Rela3onships  that  have  their  own  aOributes  frequently  including  temporal  bounds  

ISF  design  paOerns  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Ontologies  in  the  LOD  context  

•  Closer  alignment  with  exis3ng  widely  used  ontologies  – E.g.,  VCard  – W3C  Org  ontology  

•  Op3on  for  popula3ng  shortcut  rela3ons  across  VIVO  context  nodes  

•  Iden3fier  crosswalks  

CTSAconnect

Reveal Connections. Realize Potential.  

   

The  unknown  author  problem  •  VIVO  connects  a  publica3on  with  an  author  via  an  authorship  rela3onship  –  Provides  a  way  to  store  author  rank  –  Provides  an  “authorAsListed”  property  to  record  the  exact  format  of  the  name  

•  We  have  recommended  crea3ng  linked  foaf:Person  records  for  each  author  –  This  allows  storage  of  name  parts  and/or  affilia3on  without  duplica3ng  proper3es  

–  But  it  adds  a  large  number  of  unknown  person  records,  and  implies  you  know  more  than  you  do  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Internal  thing  and  unknown  persons  

•  In  browsing  VIVO,  users  expect  people  and  organiza3ons  local  to  their  ins3tu3on  

•  Similar  expecta3ons  are  voiced  for  searching  •  VIVO  has  mechanisms  to  privilege  internal  organiza3ons  and  people  and  hide  others,  but  there  must  be  a  beOer  way  

•  The  ISF  expands  the  vivo:Address  into  a  VCard  object  that  will  be  a  useful  alterna3ve  to  foaf:Person  for  unknown  authors  

CTSAconnect

Reveal Connections. Realize Potential.  

   

The  Vcard  ontology  

CTSAconnect

Reveal Connections. Realize Potential.  

   

VCards  for  disambigua3on  •  VCards  will  be  useful  to  represent  name  variants  in  the  combina3ons  actually  appearing,  together  with  affilia3on  informa3on  or  an  email  address  

•  Dis3nguishing  Vcards  from  Persons  will  prevent  confusion  and  reduce  false  matches  

•  Allows  separate  processing  to  look  at  the  universe  of  VCards  with  respect  to  the  universe  of  known  persons  –  In  one  VIVO,  but  poten3ally  across  many  or  with  reference  to  ORCID  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Internal  vs.  external  vocabularies  

•  Very  early  recogni3on  that  VIVO  should  not  import  large  controlled  vocabularies  –  They  vary  by  domain  –  They  change  –  In  many  cases  they  have  stable  URIs  (LOC,  Agrovoc,  NALT,  Gemet)  

•  With  1.4  VIVO  added  lookup  services  developed  in  concert  with  Stony  Brook  – UMLS  (hosted  by  Stony  Brook)  and  GEMET  –  VIVO  stores  the  remote  concept  with  its  external  URI  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Expanding  VIVO  lookup  services  

•  Addi3onal  vocabularies  –  LCSH,  Agrovoc,  NAL  Thesaurus,  …  – Developing  a  service  template  for  vocabularies  to  adopt?  

•  Authority  services  –  People  (orcid.org)  – Organiza3ons  (viaf.org)  –  Events?  

•  Leveraging  vivosearch.org  –  Linking  from  one  VIVO  to  another  

CTSAconnect

Reveal Connections. Realize Potential.  

   

Special  areas  of  focus  at  the  I-­‐Fest  

•  Knowledge  mobiliza3on  •  Represen3ng  the  humani3es  •  Linking  people  and  – Publica3ons  – Grants  – Facili3es  and  other  research  resources  – Datasets  

CTSAconnect

Reveal Connections. Realize Potential.  

   

www.ctsaconnect.org CTSAconnect

Reveal Connections. Realize Potential.  

   

CTSAconnect  Team        

CTSA  10-­‐001:  100928SB23  PROJECT  #:  00921-­‐0001  

OHSU:  Melissa  Haendel,  Carlo  Torniai,  Nicole  Vasilevsky,  Shahim  Essaid,  Eric  Orwoll    Cornell  University:  Jon  Corson-­‐Rikert,  Dean  KraK,  Brian  Lowe  University  of  Florida:    Mike  Conlon,  Chris  Barnes,  Nicholas  Rejack  

Stony  Brook  University:    Moises  Eisenberg,  Erich  Bremer,  Janos  Hajagos    Harvard  University:  Daniela  Bourges-­‐Waldegg  Sophia  Cheng    Share  Center:  Chris  Kelleher,  Will  CorbeV,  Ranjit  Das,  Ben  Sharma    University  at  Buffalo:  Barry  Smith,  Dagobert  Soergel  

   CTSAconnect  project    ctsaconnect.org    The  clinical  module  source:    hVp://bit.ly/clinical-­‐isf  

 CTSAconnect  ontology  sourcehVp://code.google.com/p/connect-­‐isf/      

 

   

Resources