16
Undefined 1 (2014) 1–5 1 IOS Press Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the Web Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University, Country Pierre-Yves Vandenbussche a,* , Ghislain A. Atemezing b , María Poveda-Villalón c and Bernard Vatant d a Fujitsu (Ireland) Limited, Swords, Co. Dublin, Ireland E-mail: [email protected] b Mondeca, 35 boulevard de Strasbourg, 75010 Paris, France E-mail: [email protected] c Ontology Engineering Group (OEG), Universidad Politécnica de Madrid, Madrid, Spain E-mail: mpoveda@fi.upm.es d Mondeca, 35 boulevard de Strasbourg, 75010 Paris, France E-mail: [email protected] Abstract. One of the major barriers to the deployment of Linked Data is the difficulty that data publishers have in determining which vocabularies to use to describe the semantics of data. This system report describes Linked Open Vocabularies (LOV), a high quality catalogue of reusable vocabularies for the description of data on the Web. The LOV initiative gathers and makes visible indicators that have not been previously harvested such as the interconnections between vocabularies, version history along with past and current referent (individual or organization). LOV goes beyond existing Semantic Web vocabulary search engines and takes into consideration the value’s property type, matched with a query, to improve vocabulary terms scoring. By providing an extensive range of data access methods (SPARQL endpoint, API, data dump or UI), we try to facilitate the reuse of well-documented vocabularies in the Linked Data ecosystem. We conclude that the adoption in many applications and methods of LOV shows the benefits of such a set of vocabularies and related features to aid the design and publication of data on the Web. Keywords: LOV, Linked Open Vocabularies, Ontology search, Linked Data, Vocabulary catalogue 1. Introduction The last two decades has seen the emergence of a “Semantic Web” enabling humans and computer systems to exchange data with unambiguous, shared meaning. This vision has been supported by World Wide Web Consortium (W3C) Recommendations such as the Resource Description Framework (RDF), RDF- Schema and the Web Ontology Language (OWL). Thanks to a major effort in publishing data following Semantic Web and Linked Data principles [5], there * Thanks to Amélie Gyrard, Thomas Francart, Thérèze Rogez and Anthony McCauley for their help on the project. are now tens of billions of facts spanning hundreds of linked datasets on the Web covering a wide range of topics. Access to the data is facilitated by portals (such as Datahub 1 or UK Government Data 2 ) or by publish- ing the data directly (such as New York Times 3 ). Despite the enormous volumes of data now available on the Web, Linked Data suffers from low community interest in vocabulary 4 management in favour of the 1 http://datahub.io/ 2 http://data.gov.uk/ 3 http://data.nytimes.com/ 4 We use the terms “semantic vocabulary”, “vocabulary” and “on- tology” interchangeably. 0000-0000/14/$00.00 © 2014 – IOS Press and the authors. All rights reserved

Linked Open Vocabularies (LOV): a gateway to reusable semantic

Embed Size (px)

Citation preview

Undefined 1 (2014) 1ndash5 1IOS Press

Linked Open Vocabularies (LOV) a gatewayto reusable semantic vocabularies on the WebEditor(s) Name Surname University CountrySolicited review(s) Name Surname University CountryOpen review(s) Name Surname University Country

Pierre-Yves Vandenbussche alowast Ghislain A Atemezing b Mariacutea Poveda-Villaloacuten c and Bernard Vatant da Fujitsu (Ireland) Limited Swords Co Dublin IrelandE-mail pierre-yvesvandenbusscheiefujitsucomb Mondeca 35 boulevard de Strasbourg 75010 Paris FranceE-mail ghislainatemezingmondecacomc Ontology Engineering Group (OEG) Universidad Politeacutecnica de Madrid Madrid SpainE-mail mpovedafiupmesd Mondeca 35 boulevard de Strasbourg 75010 Paris FranceE-mail bernardvatantmondecacom

Abstract One of the major barriers to the deployment of Linked Data is the difficulty that data publishers have in determiningwhich vocabularies to use to describe the semantics of data This system report describes Linked Open Vocabularies (LOV) ahigh quality catalogue of reusable vocabularies for the description of data on the Web The LOV initiative gathers and makesvisible indicators that have not been previously harvested such as the interconnections between vocabularies version historyalong with past and current referent (individual or organization) LOV goes beyond existing Semantic Web vocabulary searchengines and takes into consideration the valuersquos property type matched with a query to improve vocabulary terms scoring Byproviding an extensive range of data access methods (SPARQL endpoint API data dump or UI) we try to facilitate the reuse ofwell-documented vocabularies in the Linked Data ecosystem We conclude that the adoption in many applications and methodsof LOV shows the benefits of such a set of vocabularies and related features to aid the design and publication of data on the Web

Keywords LOV Linked Open Vocabularies Ontology search Linked Data Vocabulary catalogue

1 Introduction

The last two decades has seen the emergence ofa ldquoSemantic Webrdquo enabling humans and computersystems to exchange data with unambiguous sharedmeaning This vision has been supported by WorldWide Web Consortium (W3C) Recommendations suchas the Resource Description Framework (RDF) RDF-Schema and the Web Ontology Language (OWL)Thanks to a major effort in publishing data followingSemantic Web and Linked Data principles [5] there

Thanks to Ameacutelie Gyrard Thomas Francart Theacuteregraveze Rogez andAnthony McCauley for their help on the project

are now tens of billions of facts spanning hundreds oflinked datasets on the Web covering a wide range oftopics Access to the data is facilitated by portals (suchas Datahub1 or UK Government Data2) or by publish-ing the data directly (such as New York Times3)

Despite the enormous volumes of data now availableon the Web Linked Data suffers from low communityinterest in vocabulary4 management in favour of the

1httpdatahubio2httpdatagovuk3httpdatanytimescom4We use the terms ldquosemantic vocabularyrdquo ldquovocabularyrdquo and ldquoon-

tologyrdquo interchangeably

0000-000014$0000 copy 2014 ndash IOS Press and the authors All rights reserved

2 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

data itself A vocabulary consists of classes propertiesand datatypes that define the meaning of data RDFVocabularies are themselves expressed as Linked DataWhen a vocabulary is not published or not availableany more humans and machines do not have accessto the definition of the terms used to qualify the dataThis breaks the semantic interoperability one of thefundamentals of the Semantic Web

Started in March 2011 as part of the DataLift re-search project [24] and hosted by the Open KnowledgeFoundation the Linked Open Vocabularies (LOV) ini-tiative is now an innovative observatory of the seman-tic vocabularies ecosystem It gathers and makes visi-ble indicators not yet harvested before such as the in-terconnections between vocabularies versioning his-tory along with past and current referent (individual ororganization) The intended purpose of LOV is to pro-mote and facilitate the reuse of well documented vo-cabularies in the Linked Data ecosystem The numberof vocabularies indexed by LOV is constantly growing(511 as of June 2015) thanks to a community effort Itis the only catalogue to the best of our knowledge thatprovides all types of search criteria metadata searchontology search APIs a comprehensive dump file andSPARQL endpoint access According to the categoriesof ontology libraries defined by DrsquoAquin and Noy [9]LOV falls under the categories ldquocurated ontology di-rectoryrdquo and ldquoapplication platformrdquo

This report is structured as follows In the next sec-tion we provide statistics on the usage of LOV In sec-tion 3 we describe the components and features thatconstitute LOV Thereafter in section 4 we explainhow LOV is used to support Data Publication and On-tology Engineering processes Subsequently we pro-vide an overview of some applications and researchprojects based and motivated by the LOV system (sec-tion 5) In section 6 we report on related work Dis-cussion about the limitations and further developmentof LOV is presented in section 7 We finally reach ourconclusions in section 8

2 LOV state

The LOV dataset consists of 511 vocabularies as ofJune 2015 Figure 1 depicts the evolution of the num-ber of vocabularies inserted in the LOV dataset sinceMarch 2011 The addition of new vocabularies to LOVhas been fairly constant with two outstanding events1) an increase beginning of 2012 corresponding to thedeployment of LOV version 2 which automates most

of the vocabulary analyses and 2) a small decreaseand plateau beginning of 2015 corresponding to the de-ployment of LOV version 3 At that time we were con-sidering removing offline vocabularies but finally de-cided to keep them with a special flag making LOV thelast semantic continuity for datasets referencing un-reachable vocabularies

2011

-04

2011

-07

2011

-10

2012

-01

2012

-04

2012

-07

2012

-10

2013

-01

2013

-04

2013

-07

2013

-10

2014

-01

2014

-04

2014

-07

2014

-10

2015

-01

2015

-04

2015

-07

0

100

200

300

400

500Vocab Number

Fig 1 Evolution of the number of vocabularies inLOV from March 2011 to June 2015

By observing the vocabularies contained in LOV asa whole we can extract some interesting facts aboutSemantic Web adoption and dynamics In figure 2we present a distribution of LOV vocabularies by cre-ation date noting the main W3C Recommendationlanguages used RDF RDFS and OWL The distribu-tion follows a bell curve with the peak in 2011 Adecrease of vocabulary creation does not necessarilymean a decrease in use of the technology but ratherthat the existing vocabularies now cover a large part ofthe semantic description needed Figure 3 shows thelast modification date of LOV vocabularies over timedemonstrating a living ecosystem

Overall LOV vocabularies contain 20000 classesand almost 30000 properties with a median num-ber per vocabulary of 9 and 17 respectively Table 1presents a breakdown of LOV dataset content by vo-cabulary element type

Out of 511 vocabularies 6614 explicitly usethe English language for labelscomments Table 2presents the number and percentage of the top five lan-guages detected in LOV Figure 4 shows the distribu-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 3

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

10

20

30

40

RD

F

RD

F1

1

RD

FSan

dR

DF

revi

sed

OW

L

Fig 2 Vocabularies distribution by creation date

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

20

40

60

80

100

Fig 3 Vocabularies distribution by last modificationdate

tion of vocabularies per number of languages explic-itly used 2798 vocabularies still do not provide anylanguage information and only 1468 of vocabular-ies are multilingual In total 45 Languages are used byvocabularies in LOV We will discuss the importanceof providing multilingual vocabularies in section 8

From January to June 2015 more than 14 millionsearches were conducted on LOV5 A breakdown of

5This figure includes searches from the API and UI as well assearches with and without keywords such as ldquoall agents that partic-

Type Count Median per vocabClasses 20034 9Properties 29925 17Instances 5232 0Datatypes 101 0

Table 1 LOV vocabulary elements statistics

Language Nb Vocabs Vocabs (out of 511)English 338 6614French 37 724Spanish 25 489German 19 372Italian 18 352

Table 2 Top five languages and percentage detected inthe LOV catalogue Some vocabularies can make useof multiple languages

searches per element type is provided in table 3 Wecan see that the new feature (from LOV version 3)of agent search (used for instance to identify expertsof a domain in semantic web vocabulary design andpublication) is the most used Searches that includekeywords (and not only filters) are concerned withvocabulary terms Table 4 presents the top 10 termssearched from January to June 2015 Although mostof the searches are performed through the User Inter-face an application ecosystem using LOV APIs hassurfaced as shown in the figure 5

Vocabulary Term Nb searches searchesset 7092 879domain 2518 312some 2473 306status 1486 184iso 639 1389 172same 1285 159state 1235 153supports 1145 142start 887 11space 864 107

Table 4 Top 10 terms searched from January to June2015 by users in LOV

Over the last four years the Linked Open Vocabular-ies initiative has gathered a community of around 480

ipated in vocabulary design and publication in the geo-location do-mainrdquo

4 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Element Type Nb searches searches Nb searches searcheswith keyword with keyword

Term 205682 1419 80728 9284Vocabulary 178837 1234 5918 681Agent 1064597 7347 306 035

Table 3 Type of elements searched from January to June 2015 by users in LOV for all searches and those withkeyword

0 1 2 3 4 5 6 7+0

100

200

300

143

292

49

7 2 7 5 5

Fig 4 Distribution of the vocabularies per languagecount

people interested in various domains including ontol-ogy engineering or data publication the LOV Google+community6 is now an important place to discuss re-port and announce general facts related to vocabular-ies on the Web The LOV dataset itself referenced 389resources of type persons and 111 of type organizationparticipating in vocabulary design andor publication

3 System Components and Features

LOV architecture is composed of four main compo-nents (cf figure 6) 1) Tracking and Analysis Respon-sible for checking for any vocabulary version updateand analysing vocabulariesrsquo specific features 2) Cu-ration Ensuring the high quality of the LOV datasetthrough methods for the community to suggest or editthe catalogue 3) Data Access Provides access to thedata through a large range of methods and protocols

6httpsplusgooglecomcommunities108509791366293651606

Jan-

15

Feb-

15

Mar

-15

Apr

-15

May

-15

Jun-

15

103

104

105

month

sear

ches

num

ber

APIUI

Fig 5 Evolution of the number of searches throughUI and API methods from January to June 2015

to facilitate the use of LOV dataset and 4) LOV UserInterfaces and Application Program Interfaces Eachcomponent provides a set of features detailed in thefollowing subsections

31 Tracking and Analysis

The Tracking and Analysis component takes careof dereferencing7 LOV vocabularies storing a versionlocally (in notation 3 format) and extracting relevantmetadata

311 Vocabulary Level AnalysisAt the Vocabulary level the system extracts three

types of information for each vocabulary version (fig-ure 7)

ndash Metadata associated to the vocabulary This in-formation is explicitly defined within the vocabu-

7URI is looked up over HTTP to return content in a processableformat such as XMLRDF notation 3 or turtle

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 5

Web

LOV Community

LOV APIs

LOV UIs

Tracking andAnalysis

Curation Data Access

LOV Catalogue

RDF Store Full Text Index

RDF Dumps

Version cache

Pre

sen

tati

on

Lay

erB

usi

nes

s La

yer

Dat

a La

yer

Suggestion Edition

Search Engine SPARQL Endpoint

etc

Fig 6 Overview of the Linked Open Vocabularies Ar-chitecture

lary to provide context and useful data about thevocabulary Some high level vocabularies can bereused for that purpose such as Dublin Core8 todescribe authors contributors publishers or Cre-ative Commons9 for the description of a license

ndash Inlinks vocabularies making explicit the links fromanother vocabulary based on the semantic relationof their terms

ndash Outlinks vocabularies making explicit thelinks to another vocabulary based on the semanticrelation of their terms

There are many ways two vocabularies can be in-terlinked Letrsquos consider two vocabularies V 1 and V 2such that V 1 contains a class c1 and a property p1 andV 2 contains a class c2 and a property p2 Relationshipsbetween these two vocabularies can be of the followingtypes (the lines and numbers in brackets correspond toreal examples presented in listing 1)

Metadata some terms from V 2 are reused to providemetadata about V 1 (lines 1 to 2)

Import some terms from V 2 are reused with V 1 tocapture the semantic of the data (lines 3 to 4)

Specialization V 1 defines some subclasses or sub-properties (or local restrictions) of V 2 (lines 5 to8)

Generalization V 1 defines some superclasses or su-perproperties of V 2 (lines 9 to 11)

8httppurlorgdcterms9httpcreativecommonsorgns

OutlinksVocabularies reused

by dcat

dcat

dvia

adms voafspecializes

skos

foaf

dcterms

voafextends

schema

gtfs

voafhasEquivalencesWith

voafextends

voafspecializes

voafmetadataVoc

InlinksVocabularies reusing

dcat

voafmetadataVoc

dcat Metadata

-dctermstitle Data Catalog Vocabularyen-dctermscontributor lthttpgooglecom+RichardCyganiakgt-dctermsissued 2012-04-04^^xsddate-foafhomepage httpwwww3orgTRvocab-dcat-vannpreferredNamespaceUri httpwwww3orgnsdcat

Fig 7 Metadata type vocabulary inlinks and outlinksof DCAT vocabulary

Extension V 1 extends the expressivity of V 2 (lines12 to 15)

Equivalence V 1 declares some equivalent classes orproperties with V 2 (lines 16 to 20)

Disjunction V 1 declares some disjunct classes withV 2 (lines 21 to 23)

These relationships with the exception of Importwhich is represented by owlimports are capturedby the Vocabulary of a Friend10 (VOAF) Table 5presents a breakdown of the occurrences of each rela-tion in LOV

Inter vocabulary relationship Nb RelationsvoafmetadataVoc 2637voafspecializes 1269voafextends 1031owlimports 373voafhasEquivalencesWith 201voafgeneralizes 57voafhasDisjunctionsWith 16

Table 5 Inter-vocabularies relationship types and theirnumber of occurrences in LOV

312 Vocabulary Term Level AnalysisAt the vocabulary term level the system extracts

two types of information

ndash term type (class property datatype or instancedefined in the namespace of the vocabulary)

10httplovokfnorgvocommonsvoaf

6 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 1 Examples of Inter-vocabulary relationships

1 Metadata2 lthttpwwww3org200402skoscoregt dcttitle SKOS Vocabularyen3 Import - V1 imports V24 lthttppurlorgNETc4dmeventowlgt owlimports lthttpwwww3org2006timegt5 Specialization - c1 is subclass of c26 lthttpopenvocaborgtermsBirthgt rdfssubClassOf lthttppurlorgNETc4dmeventowlEventgt7 Specialization - p1 is subproperty of p28 lthttppurlorgsparfabiohasEmbargoDategt rdfssubPropertyOf lthttppurlorgdctermsdategt9 Generalization - c1 has for narrower match c2

10 lthttpsemanticwebcsvunl200911semPlacegt skosnarrowMatch11 lthttpwwww3org200301geowgs84_posSpatialThinggt12 Extension - p1 is inverse of p213 lthttpvivoweborgontologycoretranslatorOfgt owlinverseOf lthttppurlorgontologybibotranslatorgt14 Extension - p1 has for domain c215 lthttpxmlnscomfoaf01based_neargt rdfsdomain lthttpwwww3org200301geowgs84_posSpatialThinggt16 Equivalence - p1 is equivalent to p217 lthttplsdiscsugaeduprojectssemdisopusjournal_namegt owlequivalentProperty18 lthttppurlorgnetnknoufnsbibtexhasJournalgt19 Equivalence - c1 is equivalent to c220 lthttpwwwlocgovmadsrdfv1Languagegt owlequivalentClass lthttppurlorgdctermsLinguisticSystemgt21 Disjunction - c1 is disjoint with c222 lthttpwwwontologydesignpatternsorgontdulDULowlTimeIntervalgtowldisjointWith23 lthttpwwwontologydesignpatternsorgontdulontopicowlSubjectSpacegt

which is provided for indexing by the search en-gine so a user can filter a search based on thisinformation

ndash term natural language annotations with their pred-icate (eg rdfslabel Temperatureen) This information is provided for indexing bythe search engine and will later be used (cf sec-tion 331) in the ranking algorithm

The information concerning vocabulary terms use inLinked Open Data also named popularity is usedin LOV search results ranking as explained in sec-tion 331 This information is not natively presentin the vocabularies and can not be inferred from theLOV dataset The LODStats project gathers compre-hensive statistics about RDF datasets [11] LOV reg-ularly fetches LODStats raw data11 using the Vocab-ulary of Interlinked Datasets (VoID) [1] and the DataCube vocabulary We pre-process LODStats data be-fore inserting it to LOV Indeed There are many dupli-cates in LODStats representing in fact the same vocab-ulary URI (eg foaf has three different records12 andhas to be mapped to a single entry in LOV

11We keep synchronised the statistics available at httpstatslod2eurdfdocsvoid Unfortunately this file hasbeen unavailable since June 2014 which explains some differencesbetween the statistics we use and LODStats

12httpstatslod2euvocabulariessearch=foaf

32 Curation

The vocabulary collection is maintained by curatorsin charge of validating inserting a vocabulary in theLOV ecosystem and assigning a detailed review

321 Vocabulary InsertionCompared with other vocabulary catalogues (cf

section 6) LOV relies on a semi-automated process forvocabulary insertion illustrated in figure 8 Whereas anautomated process puts the emphasis on the volume inour process we focus on the quality of each vocabularyand therefore the quality of the overall LOV ecosys-tem Suggestions come from the community and frominter-vocabulary reference links Our system providesa feature to suggest13 the insertion of a new vocabularyThis feature allows a user to check what informationthe LOV application can automatically detect and ex-tract LOV curators then check if the vocabulary fallsin the scope of LOV and if it meets LOV vocabularyquality requirements to be reused

1 a vocabulary should be written in RDF and bedereferenceable

2 a vocabulary should be parsable without error(warning tolerated)

3 all vocabulary terms (classes properties anddatatypes) in a vocabulary should have anrdfslabel

13httplovokfnorgdatasetlovsuggest

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

2 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

data itself A vocabulary consists of classes propertiesand datatypes that define the meaning of data RDFVocabularies are themselves expressed as Linked DataWhen a vocabulary is not published or not availableany more humans and machines do not have accessto the definition of the terms used to qualify the dataThis breaks the semantic interoperability one of thefundamentals of the Semantic Web

Started in March 2011 as part of the DataLift re-search project [24] and hosted by the Open KnowledgeFoundation the Linked Open Vocabularies (LOV) ini-tiative is now an innovative observatory of the seman-tic vocabularies ecosystem It gathers and makes visi-ble indicators not yet harvested before such as the in-terconnections between vocabularies versioning his-tory along with past and current referent (individual ororganization) The intended purpose of LOV is to pro-mote and facilitate the reuse of well documented vo-cabularies in the Linked Data ecosystem The numberof vocabularies indexed by LOV is constantly growing(511 as of June 2015) thanks to a community effort Itis the only catalogue to the best of our knowledge thatprovides all types of search criteria metadata searchontology search APIs a comprehensive dump file andSPARQL endpoint access According to the categoriesof ontology libraries defined by DrsquoAquin and Noy [9]LOV falls under the categories ldquocurated ontology di-rectoryrdquo and ldquoapplication platformrdquo

This report is structured as follows In the next sec-tion we provide statistics on the usage of LOV In sec-tion 3 we describe the components and features thatconstitute LOV Thereafter in section 4 we explainhow LOV is used to support Data Publication and On-tology Engineering processes Subsequently we pro-vide an overview of some applications and researchprojects based and motivated by the LOV system (sec-tion 5) In section 6 we report on related work Dis-cussion about the limitations and further developmentof LOV is presented in section 7 We finally reach ourconclusions in section 8

2 LOV state

The LOV dataset consists of 511 vocabularies as ofJune 2015 Figure 1 depicts the evolution of the num-ber of vocabularies inserted in the LOV dataset sinceMarch 2011 The addition of new vocabularies to LOVhas been fairly constant with two outstanding events1) an increase beginning of 2012 corresponding to thedeployment of LOV version 2 which automates most

of the vocabulary analyses and 2) a small decreaseand plateau beginning of 2015 corresponding to the de-ployment of LOV version 3 At that time we were con-sidering removing offline vocabularies but finally de-cided to keep them with a special flag making LOV thelast semantic continuity for datasets referencing un-reachable vocabularies

2011

-04

2011

-07

2011

-10

2012

-01

2012

-04

2012

-07

2012

-10

2013

-01

2013

-04

2013

-07

2013

-10

2014

-01

2014

-04

2014

-07

2014

-10

2015

-01

2015

-04

2015

-07

0

100

200

300

400

500Vocab Number

Fig 1 Evolution of the number of vocabularies inLOV from March 2011 to June 2015

By observing the vocabularies contained in LOV asa whole we can extract some interesting facts aboutSemantic Web adoption and dynamics In figure 2we present a distribution of LOV vocabularies by cre-ation date noting the main W3C Recommendationlanguages used RDF RDFS and OWL The distribu-tion follows a bell curve with the peak in 2011 Adecrease of vocabulary creation does not necessarilymean a decrease in use of the technology but ratherthat the existing vocabularies now cover a large part ofthe semantic description needed Figure 3 shows thelast modification date of LOV vocabularies over timedemonstrating a living ecosystem

Overall LOV vocabularies contain 20000 classesand almost 30000 properties with a median num-ber per vocabulary of 9 and 17 respectively Table 1presents a breakdown of LOV dataset content by vo-cabulary element type

Out of 511 vocabularies 6614 explicitly usethe English language for labelscomments Table 2presents the number and percentage of the top five lan-guages detected in LOV Figure 4 shows the distribu-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 3

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

10

20

30

40

RD

F

RD

F1

1

RD

FSan

dR

DF

revi

sed

OW

L

Fig 2 Vocabularies distribution by creation date

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

20

40

60

80

100

Fig 3 Vocabularies distribution by last modificationdate

tion of vocabularies per number of languages explic-itly used 2798 vocabularies still do not provide anylanguage information and only 1468 of vocabular-ies are multilingual In total 45 Languages are used byvocabularies in LOV We will discuss the importanceof providing multilingual vocabularies in section 8

From January to June 2015 more than 14 millionsearches were conducted on LOV5 A breakdown of

5This figure includes searches from the API and UI as well assearches with and without keywords such as ldquoall agents that partic-

Type Count Median per vocabClasses 20034 9Properties 29925 17Instances 5232 0Datatypes 101 0

Table 1 LOV vocabulary elements statistics

Language Nb Vocabs Vocabs (out of 511)English 338 6614French 37 724Spanish 25 489German 19 372Italian 18 352

Table 2 Top five languages and percentage detected inthe LOV catalogue Some vocabularies can make useof multiple languages

searches per element type is provided in table 3 Wecan see that the new feature (from LOV version 3)of agent search (used for instance to identify expertsof a domain in semantic web vocabulary design andpublication) is the most used Searches that includekeywords (and not only filters) are concerned withvocabulary terms Table 4 presents the top 10 termssearched from January to June 2015 Although mostof the searches are performed through the User Inter-face an application ecosystem using LOV APIs hassurfaced as shown in the figure 5

Vocabulary Term Nb searches searchesset 7092 879domain 2518 312some 2473 306status 1486 184iso 639 1389 172same 1285 159state 1235 153supports 1145 142start 887 11space 864 107

Table 4 Top 10 terms searched from January to June2015 by users in LOV

Over the last four years the Linked Open Vocabular-ies initiative has gathered a community of around 480

ipated in vocabulary design and publication in the geo-location do-mainrdquo

4 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Element Type Nb searches searches Nb searches searcheswith keyword with keyword

Term 205682 1419 80728 9284Vocabulary 178837 1234 5918 681Agent 1064597 7347 306 035

Table 3 Type of elements searched from January to June 2015 by users in LOV for all searches and those withkeyword

0 1 2 3 4 5 6 7+0

100

200

300

143

292

49

7 2 7 5 5

Fig 4 Distribution of the vocabularies per languagecount

people interested in various domains including ontol-ogy engineering or data publication the LOV Google+community6 is now an important place to discuss re-port and announce general facts related to vocabular-ies on the Web The LOV dataset itself referenced 389resources of type persons and 111 of type organizationparticipating in vocabulary design andor publication

3 System Components and Features

LOV architecture is composed of four main compo-nents (cf figure 6) 1) Tracking and Analysis Respon-sible for checking for any vocabulary version updateand analysing vocabulariesrsquo specific features 2) Cu-ration Ensuring the high quality of the LOV datasetthrough methods for the community to suggest or editthe catalogue 3) Data Access Provides access to thedata through a large range of methods and protocols

6httpsplusgooglecomcommunities108509791366293651606

Jan-

15

Feb-

15

Mar

-15

Apr

-15

May

-15

Jun-

15

103

104

105

month

sear

ches

num

ber

APIUI

Fig 5 Evolution of the number of searches throughUI and API methods from January to June 2015

to facilitate the use of LOV dataset and 4) LOV UserInterfaces and Application Program Interfaces Eachcomponent provides a set of features detailed in thefollowing subsections

31 Tracking and Analysis

The Tracking and Analysis component takes careof dereferencing7 LOV vocabularies storing a versionlocally (in notation 3 format) and extracting relevantmetadata

311 Vocabulary Level AnalysisAt the Vocabulary level the system extracts three

types of information for each vocabulary version (fig-ure 7)

ndash Metadata associated to the vocabulary This in-formation is explicitly defined within the vocabu-

7URI is looked up over HTTP to return content in a processableformat such as XMLRDF notation 3 or turtle

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 5

Web

LOV Community

LOV APIs

LOV UIs

Tracking andAnalysis

Curation Data Access

LOV Catalogue

RDF Store Full Text Index

RDF Dumps

Version cache

Pre

sen

tati

on

Lay

erB

usi

nes

s La

yer

Dat

a La

yer

Suggestion Edition

Search Engine SPARQL Endpoint

etc

Fig 6 Overview of the Linked Open Vocabularies Ar-chitecture

lary to provide context and useful data about thevocabulary Some high level vocabularies can bereused for that purpose such as Dublin Core8 todescribe authors contributors publishers or Cre-ative Commons9 for the description of a license

ndash Inlinks vocabularies making explicit the links fromanother vocabulary based on the semantic relationof their terms

ndash Outlinks vocabularies making explicit thelinks to another vocabulary based on the semanticrelation of their terms

There are many ways two vocabularies can be in-terlinked Letrsquos consider two vocabularies V 1 and V 2such that V 1 contains a class c1 and a property p1 andV 2 contains a class c2 and a property p2 Relationshipsbetween these two vocabularies can be of the followingtypes (the lines and numbers in brackets correspond toreal examples presented in listing 1)

Metadata some terms from V 2 are reused to providemetadata about V 1 (lines 1 to 2)

Import some terms from V 2 are reused with V 1 tocapture the semantic of the data (lines 3 to 4)

Specialization V 1 defines some subclasses or sub-properties (or local restrictions) of V 2 (lines 5 to8)

Generalization V 1 defines some superclasses or su-perproperties of V 2 (lines 9 to 11)

8httppurlorgdcterms9httpcreativecommonsorgns

OutlinksVocabularies reused

by dcat

dcat

dvia

adms voafspecializes

skos

foaf

dcterms

voafextends

schema

gtfs

voafhasEquivalencesWith

voafextends

voafspecializes

voafmetadataVoc

InlinksVocabularies reusing

dcat

voafmetadataVoc

dcat Metadata

-dctermstitle Data Catalog Vocabularyen-dctermscontributor lthttpgooglecom+RichardCyganiakgt-dctermsissued 2012-04-04^^xsddate-foafhomepage httpwwww3orgTRvocab-dcat-vannpreferredNamespaceUri httpwwww3orgnsdcat

Fig 7 Metadata type vocabulary inlinks and outlinksof DCAT vocabulary

Extension V 1 extends the expressivity of V 2 (lines12 to 15)

Equivalence V 1 declares some equivalent classes orproperties with V 2 (lines 16 to 20)

Disjunction V 1 declares some disjunct classes withV 2 (lines 21 to 23)

These relationships with the exception of Importwhich is represented by owlimports are capturedby the Vocabulary of a Friend10 (VOAF) Table 5presents a breakdown of the occurrences of each rela-tion in LOV

Inter vocabulary relationship Nb RelationsvoafmetadataVoc 2637voafspecializes 1269voafextends 1031owlimports 373voafhasEquivalencesWith 201voafgeneralizes 57voafhasDisjunctionsWith 16

Table 5 Inter-vocabularies relationship types and theirnumber of occurrences in LOV

312 Vocabulary Term Level AnalysisAt the vocabulary term level the system extracts

two types of information

ndash term type (class property datatype or instancedefined in the namespace of the vocabulary)

10httplovokfnorgvocommonsvoaf

6 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 1 Examples of Inter-vocabulary relationships

1 Metadata2 lthttpwwww3org200402skoscoregt dcttitle SKOS Vocabularyen3 Import - V1 imports V24 lthttppurlorgNETc4dmeventowlgt owlimports lthttpwwww3org2006timegt5 Specialization - c1 is subclass of c26 lthttpopenvocaborgtermsBirthgt rdfssubClassOf lthttppurlorgNETc4dmeventowlEventgt7 Specialization - p1 is subproperty of p28 lthttppurlorgsparfabiohasEmbargoDategt rdfssubPropertyOf lthttppurlorgdctermsdategt9 Generalization - c1 has for narrower match c2

10 lthttpsemanticwebcsvunl200911semPlacegt skosnarrowMatch11 lthttpwwww3org200301geowgs84_posSpatialThinggt12 Extension - p1 is inverse of p213 lthttpvivoweborgontologycoretranslatorOfgt owlinverseOf lthttppurlorgontologybibotranslatorgt14 Extension - p1 has for domain c215 lthttpxmlnscomfoaf01based_neargt rdfsdomain lthttpwwww3org200301geowgs84_posSpatialThinggt16 Equivalence - p1 is equivalent to p217 lthttplsdiscsugaeduprojectssemdisopusjournal_namegt owlequivalentProperty18 lthttppurlorgnetnknoufnsbibtexhasJournalgt19 Equivalence - c1 is equivalent to c220 lthttpwwwlocgovmadsrdfv1Languagegt owlequivalentClass lthttppurlorgdctermsLinguisticSystemgt21 Disjunction - c1 is disjoint with c222 lthttpwwwontologydesignpatternsorgontdulDULowlTimeIntervalgtowldisjointWith23 lthttpwwwontologydesignpatternsorgontdulontopicowlSubjectSpacegt

which is provided for indexing by the search en-gine so a user can filter a search based on thisinformation

ndash term natural language annotations with their pred-icate (eg rdfslabel Temperatureen) This information is provided for indexing bythe search engine and will later be used (cf sec-tion 331) in the ranking algorithm

The information concerning vocabulary terms use inLinked Open Data also named popularity is usedin LOV search results ranking as explained in sec-tion 331 This information is not natively presentin the vocabularies and can not be inferred from theLOV dataset The LODStats project gathers compre-hensive statistics about RDF datasets [11] LOV reg-ularly fetches LODStats raw data11 using the Vocab-ulary of Interlinked Datasets (VoID) [1] and the DataCube vocabulary We pre-process LODStats data be-fore inserting it to LOV Indeed There are many dupli-cates in LODStats representing in fact the same vocab-ulary URI (eg foaf has three different records12 andhas to be mapped to a single entry in LOV

11We keep synchronised the statistics available at httpstatslod2eurdfdocsvoid Unfortunately this file hasbeen unavailable since June 2014 which explains some differencesbetween the statistics we use and LODStats

12httpstatslod2euvocabulariessearch=foaf

32 Curation

The vocabulary collection is maintained by curatorsin charge of validating inserting a vocabulary in theLOV ecosystem and assigning a detailed review

321 Vocabulary InsertionCompared with other vocabulary catalogues (cf

section 6) LOV relies on a semi-automated process forvocabulary insertion illustrated in figure 8 Whereas anautomated process puts the emphasis on the volume inour process we focus on the quality of each vocabularyand therefore the quality of the overall LOV ecosys-tem Suggestions come from the community and frominter-vocabulary reference links Our system providesa feature to suggest13 the insertion of a new vocabularyThis feature allows a user to check what informationthe LOV application can automatically detect and ex-tract LOV curators then check if the vocabulary fallsin the scope of LOV and if it meets LOV vocabularyquality requirements to be reused

1 a vocabulary should be written in RDF and bedereferenceable

2 a vocabulary should be parsable without error(warning tolerated)

3 all vocabulary terms (classes properties anddatatypes) in a vocabulary should have anrdfslabel

13httplovokfnorgdatasetlovsuggest

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 3

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

10

20

30

40

RD

F

RD

F1

1

RD

FSan

dR

DF

revi

sed

OW

L

Fig 2 Vocabularies distribution by creation date

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

20

40

60

80

100

Fig 3 Vocabularies distribution by last modificationdate

tion of vocabularies per number of languages explic-itly used 2798 vocabularies still do not provide anylanguage information and only 1468 of vocabular-ies are multilingual In total 45 Languages are used byvocabularies in LOV We will discuss the importanceof providing multilingual vocabularies in section 8

From January to June 2015 more than 14 millionsearches were conducted on LOV5 A breakdown of

5This figure includes searches from the API and UI as well assearches with and without keywords such as ldquoall agents that partic-

Type Count Median per vocabClasses 20034 9Properties 29925 17Instances 5232 0Datatypes 101 0

Table 1 LOV vocabulary elements statistics

Language Nb Vocabs Vocabs (out of 511)English 338 6614French 37 724Spanish 25 489German 19 372Italian 18 352

Table 2 Top five languages and percentage detected inthe LOV catalogue Some vocabularies can make useof multiple languages

searches per element type is provided in table 3 Wecan see that the new feature (from LOV version 3)of agent search (used for instance to identify expertsof a domain in semantic web vocabulary design andpublication) is the most used Searches that includekeywords (and not only filters) are concerned withvocabulary terms Table 4 presents the top 10 termssearched from January to June 2015 Although mostof the searches are performed through the User Inter-face an application ecosystem using LOV APIs hassurfaced as shown in the figure 5

Vocabulary Term Nb searches searchesset 7092 879domain 2518 312some 2473 306status 1486 184iso 639 1389 172same 1285 159state 1235 153supports 1145 142start 887 11space 864 107

Table 4 Top 10 terms searched from January to June2015 by users in LOV

Over the last four years the Linked Open Vocabular-ies initiative has gathered a community of around 480

ipated in vocabulary design and publication in the geo-location do-mainrdquo

4 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Element Type Nb searches searches Nb searches searcheswith keyword with keyword

Term 205682 1419 80728 9284Vocabulary 178837 1234 5918 681Agent 1064597 7347 306 035

Table 3 Type of elements searched from January to June 2015 by users in LOV for all searches and those withkeyword

0 1 2 3 4 5 6 7+0

100

200

300

143

292

49

7 2 7 5 5

Fig 4 Distribution of the vocabularies per languagecount

people interested in various domains including ontol-ogy engineering or data publication the LOV Google+community6 is now an important place to discuss re-port and announce general facts related to vocabular-ies on the Web The LOV dataset itself referenced 389resources of type persons and 111 of type organizationparticipating in vocabulary design andor publication

3 System Components and Features

LOV architecture is composed of four main compo-nents (cf figure 6) 1) Tracking and Analysis Respon-sible for checking for any vocabulary version updateand analysing vocabulariesrsquo specific features 2) Cu-ration Ensuring the high quality of the LOV datasetthrough methods for the community to suggest or editthe catalogue 3) Data Access Provides access to thedata through a large range of methods and protocols

6httpsplusgooglecomcommunities108509791366293651606

Jan-

15

Feb-

15

Mar

-15

Apr

-15

May

-15

Jun-

15

103

104

105

month

sear

ches

num

ber

APIUI

Fig 5 Evolution of the number of searches throughUI and API methods from January to June 2015

to facilitate the use of LOV dataset and 4) LOV UserInterfaces and Application Program Interfaces Eachcomponent provides a set of features detailed in thefollowing subsections

31 Tracking and Analysis

The Tracking and Analysis component takes careof dereferencing7 LOV vocabularies storing a versionlocally (in notation 3 format) and extracting relevantmetadata

311 Vocabulary Level AnalysisAt the Vocabulary level the system extracts three

types of information for each vocabulary version (fig-ure 7)

ndash Metadata associated to the vocabulary This in-formation is explicitly defined within the vocabu-

7URI is looked up over HTTP to return content in a processableformat such as XMLRDF notation 3 or turtle

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 5

Web

LOV Community

LOV APIs

LOV UIs

Tracking andAnalysis

Curation Data Access

LOV Catalogue

RDF Store Full Text Index

RDF Dumps

Version cache

Pre

sen

tati

on

Lay

erB

usi

nes

s La

yer

Dat

a La

yer

Suggestion Edition

Search Engine SPARQL Endpoint

etc

Fig 6 Overview of the Linked Open Vocabularies Ar-chitecture

lary to provide context and useful data about thevocabulary Some high level vocabularies can bereused for that purpose such as Dublin Core8 todescribe authors contributors publishers or Cre-ative Commons9 for the description of a license

ndash Inlinks vocabularies making explicit the links fromanother vocabulary based on the semantic relationof their terms

ndash Outlinks vocabularies making explicit thelinks to another vocabulary based on the semanticrelation of their terms

There are many ways two vocabularies can be in-terlinked Letrsquos consider two vocabularies V 1 and V 2such that V 1 contains a class c1 and a property p1 andV 2 contains a class c2 and a property p2 Relationshipsbetween these two vocabularies can be of the followingtypes (the lines and numbers in brackets correspond toreal examples presented in listing 1)

Metadata some terms from V 2 are reused to providemetadata about V 1 (lines 1 to 2)

Import some terms from V 2 are reused with V 1 tocapture the semantic of the data (lines 3 to 4)

Specialization V 1 defines some subclasses or sub-properties (or local restrictions) of V 2 (lines 5 to8)

Generalization V 1 defines some superclasses or su-perproperties of V 2 (lines 9 to 11)

8httppurlorgdcterms9httpcreativecommonsorgns

OutlinksVocabularies reused

by dcat

dcat

dvia

adms voafspecializes

skos

foaf

dcterms

voafextends

schema

gtfs

voafhasEquivalencesWith

voafextends

voafspecializes

voafmetadataVoc

InlinksVocabularies reusing

dcat

voafmetadataVoc

dcat Metadata

-dctermstitle Data Catalog Vocabularyen-dctermscontributor lthttpgooglecom+RichardCyganiakgt-dctermsissued 2012-04-04^^xsddate-foafhomepage httpwwww3orgTRvocab-dcat-vannpreferredNamespaceUri httpwwww3orgnsdcat

Fig 7 Metadata type vocabulary inlinks and outlinksof DCAT vocabulary

Extension V 1 extends the expressivity of V 2 (lines12 to 15)

Equivalence V 1 declares some equivalent classes orproperties with V 2 (lines 16 to 20)

Disjunction V 1 declares some disjunct classes withV 2 (lines 21 to 23)

These relationships with the exception of Importwhich is represented by owlimports are capturedby the Vocabulary of a Friend10 (VOAF) Table 5presents a breakdown of the occurrences of each rela-tion in LOV

Inter vocabulary relationship Nb RelationsvoafmetadataVoc 2637voafspecializes 1269voafextends 1031owlimports 373voafhasEquivalencesWith 201voafgeneralizes 57voafhasDisjunctionsWith 16

Table 5 Inter-vocabularies relationship types and theirnumber of occurrences in LOV

312 Vocabulary Term Level AnalysisAt the vocabulary term level the system extracts

two types of information

ndash term type (class property datatype or instancedefined in the namespace of the vocabulary)

10httplovokfnorgvocommonsvoaf

6 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 1 Examples of Inter-vocabulary relationships

1 Metadata2 lthttpwwww3org200402skoscoregt dcttitle SKOS Vocabularyen3 Import - V1 imports V24 lthttppurlorgNETc4dmeventowlgt owlimports lthttpwwww3org2006timegt5 Specialization - c1 is subclass of c26 lthttpopenvocaborgtermsBirthgt rdfssubClassOf lthttppurlorgNETc4dmeventowlEventgt7 Specialization - p1 is subproperty of p28 lthttppurlorgsparfabiohasEmbargoDategt rdfssubPropertyOf lthttppurlorgdctermsdategt9 Generalization - c1 has for narrower match c2

10 lthttpsemanticwebcsvunl200911semPlacegt skosnarrowMatch11 lthttpwwww3org200301geowgs84_posSpatialThinggt12 Extension - p1 is inverse of p213 lthttpvivoweborgontologycoretranslatorOfgt owlinverseOf lthttppurlorgontologybibotranslatorgt14 Extension - p1 has for domain c215 lthttpxmlnscomfoaf01based_neargt rdfsdomain lthttpwwww3org200301geowgs84_posSpatialThinggt16 Equivalence - p1 is equivalent to p217 lthttplsdiscsugaeduprojectssemdisopusjournal_namegt owlequivalentProperty18 lthttppurlorgnetnknoufnsbibtexhasJournalgt19 Equivalence - c1 is equivalent to c220 lthttpwwwlocgovmadsrdfv1Languagegt owlequivalentClass lthttppurlorgdctermsLinguisticSystemgt21 Disjunction - c1 is disjoint with c222 lthttpwwwontologydesignpatternsorgontdulDULowlTimeIntervalgtowldisjointWith23 lthttpwwwontologydesignpatternsorgontdulontopicowlSubjectSpacegt

which is provided for indexing by the search en-gine so a user can filter a search based on thisinformation

ndash term natural language annotations with their pred-icate (eg rdfslabel Temperatureen) This information is provided for indexing bythe search engine and will later be used (cf sec-tion 331) in the ranking algorithm

The information concerning vocabulary terms use inLinked Open Data also named popularity is usedin LOV search results ranking as explained in sec-tion 331 This information is not natively presentin the vocabularies and can not be inferred from theLOV dataset The LODStats project gathers compre-hensive statistics about RDF datasets [11] LOV reg-ularly fetches LODStats raw data11 using the Vocab-ulary of Interlinked Datasets (VoID) [1] and the DataCube vocabulary We pre-process LODStats data be-fore inserting it to LOV Indeed There are many dupli-cates in LODStats representing in fact the same vocab-ulary URI (eg foaf has three different records12 andhas to be mapped to a single entry in LOV

11We keep synchronised the statistics available at httpstatslod2eurdfdocsvoid Unfortunately this file hasbeen unavailable since June 2014 which explains some differencesbetween the statistics we use and LODStats

12httpstatslod2euvocabulariessearch=foaf

32 Curation

The vocabulary collection is maintained by curatorsin charge of validating inserting a vocabulary in theLOV ecosystem and assigning a detailed review

321 Vocabulary InsertionCompared with other vocabulary catalogues (cf

section 6) LOV relies on a semi-automated process forvocabulary insertion illustrated in figure 8 Whereas anautomated process puts the emphasis on the volume inour process we focus on the quality of each vocabularyand therefore the quality of the overall LOV ecosys-tem Suggestions come from the community and frominter-vocabulary reference links Our system providesa feature to suggest13 the insertion of a new vocabularyThis feature allows a user to check what informationthe LOV application can automatically detect and ex-tract LOV curators then check if the vocabulary fallsin the scope of LOV and if it meets LOV vocabularyquality requirements to be reused

1 a vocabulary should be written in RDF and bedereferenceable

2 a vocabulary should be parsable without error(warning tolerated)

3 all vocabulary terms (classes properties anddatatypes) in a vocabulary should have anrdfslabel

13httplovokfnorgdatasetlovsuggest

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

4 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Element Type Nb searches searches Nb searches searcheswith keyword with keyword

Term 205682 1419 80728 9284Vocabulary 178837 1234 5918 681Agent 1064597 7347 306 035

Table 3 Type of elements searched from January to June 2015 by users in LOV for all searches and those withkeyword

0 1 2 3 4 5 6 7+0

100

200

300

143

292

49

7 2 7 5 5

Fig 4 Distribution of the vocabularies per languagecount

people interested in various domains including ontol-ogy engineering or data publication the LOV Google+community6 is now an important place to discuss re-port and announce general facts related to vocabular-ies on the Web The LOV dataset itself referenced 389resources of type persons and 111 of type organizationparticipating in vocabulary design andor publication

3 System Components and Features

LOV architecture is composed of four main compo-nents (cf figure 6) 1) Tracking and Analysis Respon-sible for checking for any vocabulary version updateand analysing vocabulariesrsquo specific features 2) Cu-ration Ensuring the high quality of the LOV datasetthrough methods for the community to suggest or editthe catalogue 3) Data Access Provides access to thedata through a large range of methods and protocols

6httpsplusgooglecomcommunities108509791366293651606

Jan-

15

Feb-

15

Mar

-15

Apr

-15

May

-15

Jun-

15

103

104

105

month

sear

ches

num

ber

APIUI

Fig 5 Evolution of the number of searches throughUI and API methods from January to June 2015

to facilitate the use of LOV dataset and 4) LOV UserInterfaces and Application Program Interfaces Eachcomponent provides a set of features detailed in thefollowing subsections

31 Tracking and Analysis

The Tracking and Analysis component takes careof dereferencing7 LOV vocabularies storing a versionlocally (in notation 3 format) and extracting relevantmetadata

311 Vocabulary Level AnalysisAt the Vocabulary level the system extracts three

types of information for each vocabulary version (fig-ure 7)

ndash Metadata associated to the vocabulary This in-formation is explicitly defined within the vocabu-

7URI is looked up over HTTP to return content in a processableformat such as XMLRDF notation 3 or turtle

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 5

Web

LOV Community

LOV APIs

LOV UIs

Tracking andAnalysis

Curation Data Access

LOV Catalogue

RDF Store Full Text Index

RDF Dumps

Version cache

Pre

sen

tati

on

Lay

erB

usi

nes

s La

yer

Dat

a La

yer

Suggestion Edition

Search Engine SPARQL Endpoint

etc

Fig 6 Overview of the Linked Open Vocabularies Ar-chitecture

lary to provide context and useful data about thevocabulary Some high level vocabularies can bereused for that purpose such as Dublin Core8 todescribe authors contributors publishers or Cre-ative Commons9 for the description of a license

ndash Inlinks vocabularies making explicit the links fromanother vocabulary based on the semantic relationof their terms

ndash Outlinks vocabularies making explicit thelinks to another vocabulary based on the semanticrelation of their terms

There are many ways two vocabularies can be in-terlinked Letrsquos consider two vocabularies V 1 and V 2such that V 1 contains a class c1 and a property p1 andV 2 contains a class c2 and a property p2 Relationshipsbetween these two vocabularies can be of the followingtypes (the lines and numbers in brackets correspond toreal examples presented in listing 1)

Metadata some terms from V 2 are reused to providemetadata about V 1 (lines 1 to 2)

Import some terms from V 2 are reused with V 1 tocapture the semantic of the data (lines 3 to 4)

Specialization V 1 defines some subclasses or sub-properties (or local restrictions) of V 2 (lines 5 to8)

Generalization V 1 defines some superclasses or su-perproperties of V 2 (lines 9 to 11)

8httppurlorgdcterms9httpcreativecommonsorgns

OutlinksVocabularies reused

by dcat

dcat

dvia

adms voafspecializes

skos

foaf

dcterms

voafextends

schema

gtfs

voafhasEquivalencesWith

voafextends

voafspecializes

voafmetadataVoc

InlinksVocabularies reusing

dcat

voafmetadataVoc

dcat Metadata

-dctermstitle Data Catalog Vocabularyen-dctermscontributor lthttpgooglecom+RichardCyganiakgt-dctermsissued 2012-04-04^^xsddate-foafhomepage httpwwww3orgTRvocab-dcat-vannpreferredNamespaceUri httpwwww3orgnsdcat

Fig 7 Metadata type vocabulary inlinks and outlinksof DCAT vocabulary

Extension V 1 extends the expressivity of V 2 (lines12 to 15)

Equivalence V 1 declares some equivalent classes orproperties with V 2 (lines 16 to 20)

Disjunction V 1 declares some disjunct classes withV 2 (lines 21 to 23)

These relationships with the exception of Importwhich is represented by owlimports are capturedby the Vocabulary of a Friend10 (VOAF) Table 5presents a breakdown of the occurrences of each rela-tion in LOV

Inter vocabulary relationship Nb RelationsvoafmetadataVoc 2637voafspecializes 1269voafextends 1031owlimports 373voafhasEquivalencesWith 201voafgeneralizes 57voafhasDisjunctionsWith 16

Table 5 Inter-vocabularies relationship types and theirnumber of occurrences in LOV

312 Vocabulary Term Level AnalysisAt the vocabulary term level the system extracts

two types of information

ndash term type (class property datatype or instancedefined in the namespace of the vocabulary)

10httplovokfnorgvocommonsvoaf

6 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 1 Examples of Inter-vocabulary relationships

1 Metadata2 lthttpwwww3org200402skoscoregt dcttitle SKOS Vocabularyen3 Import - V1 imports V24 lthttppurlorgNETc4dmeventowlgt owlimports lthttpwwww3org2006timegt5 Specialization - c1 is subclass of c26 lthttpopenvocaborgtermsBirthgt rdfssubClassOf lthttppurlorgNETc4dmeventowlEventgt7 Specialization - p1 is subproperty of p28 lthttppurlorgsparfabiohasEmbargoDategt rdfssubPropertyOf lthttppurlorgdctermsdategt9 Generalization - c1 has for narrower match c2

10 lthttpsemanticwebcsvunl200911semPlacegt skosnarrowMatch11 lthttpwwww3org200301geowgs84_posSpatialThinggt12 Extension - p1 is inverse of p213 lthttpvivoweborgontologycoretranslatorOfgt owlinverseOf lthttppurlorgontologybibotranslatorgt14 Extension - p1 has for domain c215 lthttpxmlnscomfoaf01based_neargt rdfsdomain lthttpwwww3org200301geowgs84_posSpatialThinggt16 Equivalence - p1 is equivalent to p217 lthttplsdiscsugaeduprojectssemdisopusjournal_namegt owlequivalentProperty18 lthttppurlorgnetnknoufnsbibtexhasJournalgt19 Equivalence - c1 is equivalent to c220 lthttpwwwlocgovmadsrdfv1Languagegt owlequivalentClass lthttppurlorgdctermsLinguisticSystemgt21 Disjunction - c1 is disjoint with c222 lthttpwwwontologydesignpatternsorgontdulDULowlTimeIntervalgtowldisjointWith23 lthttpwwwontologydesignpatternsorgontdulontopicowlSubjectSpacegt

which is provided for indexing by the search en-gine so a user can filter a search based on thisinformation

ndash term natural language annotations with their pred-icate (eg rdfslabel Temperatureen) This information is provided for indexing bythe search engine and will later be used (cf sec-tion 331) in the ranking algorithm

The information concerning vocabulary terms use inLinked Open Data also named popularity is usedin LOV search results ranking as explained in sec-tion 331 This information is not natively presentin the vocabularies and can not be inferred from theLOV dataset The LODStats project gathers compre-hensive statistics about RDF datasets [11] LOV reg-ularly fetches LODStats raw data11 using the Vocab-ulary of Interlinked Datasets (VoID) [1] and the DataCube vocabulary We pre-process LODStats data be-fore inserting it to LOV Indeed There are many dupli-cates in LODStats representing in fact the same vocab-ulary URI (eg foaf has three different records12 andhas to be mapped to a single entry in LOV

11We keep synchronised the statistics available at httpstatslod2eurdfdocsvoid Unfortunately this file hasbeen unavailable since June 2014 which explains some differencesbetween the statistics we use and LODStats

12httpstatslod2euvocabulariessearch=foaf

32 Curation

The vocabulary collection is maintained by curatorsin charge of validating inserting a vocabulary in theLOV ecosystem and assigning a detailed review

321 Vocabulary InsertionCompared with other vocabulary catalogues (cf

section 6) LOV relies on a semi-automated process forvocabulary insertion illustrated in figure 8 Whereas anautomated process puts the emphasis on the volume inour process we focus on the quality of each vocabularyand therefore the quality of the overall LOV ecosys-tem Suggestions come from the community and frominter-vocabulary reference links Our system providesa feature to suggest13 the insertion of a new vocabularyThis feature allows a user to check what informationthe LOV application can automatically detect and ex-tract LOV curators then check if the vocabulary fallsin the scope of LOV and if it meets LOV vocabularyquality requirements to be reused

1 a vocabulary should be written in RDF and bedereferenceable

2 a vocabulary should be parsable without error(warning tolerated)

3 all vocabulary terms (classes properties anddatatypes) in a vocabulary should have anrdfslabel

13httplovokfnorgdatasetlovsuggest

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 5

Web

LOV Community

LOV APIs

LOV UIs

Tracking andAnalysis

Curation Data Access

LOV Catalogue

RDF Store Full Text Index

RDF Dumps

Version cache

Pre

sen

tati

on

Lay

erB

usi

nes

s La

yer

Dat

a La

yer

Suggestion Edition

Search Engine SPARQL Endpoint

etc

Fig 6 Overview of the Linked Open Vocabularies Ar-chitecture

lary to provide context and useful data about thevocabulary Some high level vocabularies can bereused for that purpose such as Dublin Core8 todescribe authors contributors publishers or Cre-ative Commons9 for the description of a license

ndash Inlinks vocabularies making explicit the links fromanother vocabulary based on the semantic relationof their terms

ndash Outlinks vocabularies making explicit thelinks to another vocabulary based on the semanticrelation of their terms

There are many ways two vocabularies can be in-terlinked Letrsquos consider two vocabularies V 1 and V 2such that V 1 contains a class c1 and a property p1 andV 2 contains a class c2 and a property p2 Relationshipsbetween these two vocabularies can be of the followingtypes (the lines and numbers in brackets correspond toreal examples presented in listing 1)

Metadata some terms from V 2 are reused to providemetadata about V 1 (lines 1 to 2)

Import some terms from V 2 are reused with V 1 tocapture the semantic of the data (lines 3 to 4)

Specialization V 1 defines some subclasses or sub-properties (or local restrictions) of V 2 (lines 5 to8)

Generalization V 1 defines some superclasses or su-perproperties of V 2 (lines 9 to 11)

8httppurlorgdcterms9httpcreativecommonsorgns

OutlinksVocabularies reused

by dcat

dcat

dvia

adms voafspecializes

skos

foaf

dcterms

voafextends

schema

gtfs

voafhasEquivalencesWith

voafextends

voafspecializes

voafmetadataVoc

InlinksVocabularies reusing

dcat

voafmetadataVoc

dcat Metadata

-dctermstitle Data Catalog Vocabularyen-dctermscontributor lthttpgooglecom+RichardCyganiakgt-dctermsissued 2012-04-04^^xsddate-foafhomepage httpwwww3orgTRvocab-dcat-vannpreferredNamespaceUri httpwwww3orgnsdcat

Fig 7 Metadata type vocabulary inlinks and outlinksof DCAT vocabulary

Extension V 1 extends the expressivity of V 2 (lines12 to 15)

Equivalence V 1 declares some equivalent classes orproperties with V 2 (lines 16 to 20)

Disjunction V 1 declares some disjunct classes withV 2 (lines 21 to 23)

These relationships with the exception of Importwhich is represented by owlimports are capturedby the Vocabulary of a Friend10 (VOAF) Table 5presents a breakdown of the occurrences of each rela-tion in LOV

Inter vocabulary relationship Nb RelationsvoafmetadataVoc 2637voafspecializes 1269voafextends 1031owlimports 373voafhasEquivalencesWith 201voafgeneralizes 57voafhasDisjunctionsWith 16

Table 5 Inter-vocabularies relationship types and theirnumber of occurrences in LOV

312 Vocabulary Term Level AnalysisAt the vocabulary term level the system extracts

two types of information

ndash term type (class property datatype or instancedefined in the namespace of the vocabulary)

10httplovokfnorgvocommonsvoaf

6 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 1 Examples of Inter-vocabulary relationships

1 Metadata2 lthttpwwww3org200402skoscoregt dcttitle SKOS Vocabularyen3 Import - V1 imports V24 lthttppurlorgNETc4dmeventowlgt owlimports lthttpwwww3org2006timegt5 Specialization - c1 is subclass of c26 lthttpopenvocaborgtermsBirthgt rdfssubClassOf lthttppurlorgNETc4dmeventowlEventgt7 Specialization - p1 is subproperty of p28 lthttppurlorgsparfabiohasEmbargoDategt rdfssubPropertyOf lthttppurlorgdctermsdategt9 Generalization - c1 has for narrower match c2

10 lthttpsemanticwebcsvunl200911semPlacegt skosnarrowMatch11 lthttpwwww3org200301geowgs84_posSpatialThinggt12 Extension - p1 is inverse of p213 lthttpvivoweborgontologycoretranslatorOfgt owlinverseOf lthttppurlorgontologybibotranslatorgt14 Extension - p1 has for domain c215 lthttpxmlnscomfoaf01based_neargt rdfsdomain lthttpwwww3org200301geowgs84_posSpatialThinggt16 Equivalence - p1 is equivalent to p217 lthttplsdiscsugaeduprojectssemdisopusjournal_namegt owlequivalentProperty18 lthttppurlorgnetnknoufnsbibtexhasJournalgt19 Equivalence - c1 is equivalent to c220 lthttpwwwlocgovmadsrdfv1Languagegt owlequivalentClass lthttppurlorgdctermsLinguisticSystemgt21 Disjunction - c1 is disjoint with c222 lthttpwwwontologydesignpatternsorgontdulDULowlTimeIntervalgtowldisjointWith23 lthttpwwwontologydesignpatternsorgontdulontopicowlSubjectSpacegt

which is provided for indexing by the search en-gine so a user can filter a search based on thisinformation

ndash term natural language annotations with their pred-icate (eg rdfslabel Temperatureen) This information is provided for indexing bythe search engine and will later be used (cf sec-tion 331) in the ranking algorithm

The information concerning vocabulary terms use inLinked Open Data also named popularity is usedin LOV search results ranking as explained in sec-tion 331 This information is not natively presentin the vocabularies and can not be inferred from theLOV dataset The LODStats project gathers compre-hensive statistics about RDF datasets [11] LOV reg-ularly fetches LODStats raw data11 using the Vocab-ulary of Interlinked Datasets (VoID) [1] and the DataCube vocabulary We pre-process LODStats data be-fore inserting it to LOV Indeed There are many dupli-cates in LODStats representing in fact the same vocab-ulary URI (eg foaf has three different records12 andhas to be mapped to a single entry in LOV

11We keep synchronised the statistics available at httpstatslod2eurdfdocsvoid Unfortunately this file hasbeen unavailable since June 2014 which explains some differencesbetween the statistics we use and LODStats

12httpstatslod2euvocabulariessearch=foaf

32 Curation

The vocabulary collection is maintained by curatorsin charge of validating inserting a vocabulary in theLOV ecosystem and assigning a detailed review

321 Vocabulary InsertionCompared with other vocabulary catalogues (cf

section 6) LOV relies on a semi-automated process forvocabulary insertion illustrated in figure 8 Whereas anautomated process puts the emphasis on the volume inour process we focus on the quality of each vocabularyand therefore the quality of the overall LOV ecosys-tem Suggestions come from the community and frominter-vocabulary reference links Our system providesa feature to suggest13 the insertion of a new vocabularyThis feature allows a user to check what informationthe LOV application can automatically detect and ex-tract LOV curators then check if the vocabulary fallsin the scope of LOV and if it meets LOV vocabularyquality requirements to be reused

1 a vocabulary should be written in RDF and bedereferenceable

2 a vocabulary should be parsable without error(warning tolerated)

3 all vocabulary terms (classes properties anddatatypes) in a vocabulary should have anrdfslabel

13httplovokfnorgdatasetlovsuggest

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

6 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 1 Examples of Inter-vocabulary relationships

1 Metadata2 lthttpwwww3org200402skoscoregt dcttitle SKOS Vocabularyen3 Import - V1 imports V24 lthttppurlorgNETc4dmeventowlgt owlimports lthttpwwww3org2006timegt5 Specialization - c1 is subclass of c26 lthttpopenvocaborgtermsBirthgt rdfssubClassOf lthttppurlorgNETc4dmeventowlEventgt7 Specialization - p1 is subproperty of p28 lthttppurlorgsparfabiohasEmbargoDategt rdfssubPropertyOf lthttppurlorgdctermsdategt9 Generalization - c1 has for narrower match c2

10 lthttpsemanticwebcsvunl200911semPlacegt skosnarrowMatch11 lthttpwwww3org200301geowgs84_posSpatialThinggt12 Extension - p1 is inverse of p213 lthttpvivoweborgontologycoretranslatorOfgt owlinverseOf lthttppurlorgontologybibotranslatorgt14 Extension - p1 has for domain c215 lthttpxmlnscomfoaf01based_neargt rdfsdomain lthttpwwww3org200301geowgs84_posSpatialThinggt16 Equivalence - p1 is equivalent to p217 lthttplsdiscsugaeduprojectssemdisopusjournal_namegt owlequivalentProperty18 lthttppurlorgnetnknoufnsbibtexhasJournalgt19 Equivalence - c1 is equivalent to c220 lthttpwwwlocgovmadsrdfv1Languagegt owlequivalentClass lthttppurlorgdctermsLinguisticSystemgt21 Disjunction - c1 is disjoint with c222 lthttpwwwontologydesignpatternsorgontdulDULowlTimeIntervalgtowldisjointWith23 lthttpwwwontologydesignpatternsorgontdulontopicowlSubjectSpacegt

which is provided for indexing by the search en-gine so a user can filter a search based on thisinformation

ndash term natural language annotations with their pred-icate (eg rdfslabel Temperatureen) This information is provided for indexing bythe search engine and will later be used (cf sec-tion 331) in the ranking algorithm

The information concerning vocabulary terms use inLinked Open Data also named popularity is usedin LOV search results ranking as explained in sec-tion 331 This information is not natively presentin the vocabularies and can not be inferred from theLOV dataset The LODStats project gathers compre-hensive statistics about RDF datasets [11] LOV reg-ularly fetches LODStats raw data11 using the Vocab-ulary of Interlinked Datasets (VoID) [1] and the DataCube vocabulary We pre-process LODStats data be-fore inserting it to LOV Indeed There are many dupli-cates in LODStats representing in fact the same vocab-ulary URI (eg foaf has three different records12 andhas to be mapped to a single entry in LOV

11We keep synchronised the statistics available at httpstatslod2eurdfdocsvoid Unfortunately this file hasbeen unavailable since June 2014 which explains some differencesbetween the statistics we use and LODStats

12httpstatslod2euvocabulariessearch=foaf

32 Curation

The vocabulary collection is maintained by curatorsin charge of validating inserting a vocabulary in theLOV ecosystem and assigning a detailed review

321 Vocabulary InsertionCompared with other vocabulary catalogues (cf

section 6) LOV relies on a semi-automated process forvocabulary insertion illustrated in figure 8 Whereas anautomated process puts the emphasis on the volume inour process we focus on the quality of each vocabularyand therefore the quality of the overall LOV ecosys-tem Suggestions come from the community and frominter-vocabulary reference links Our system providesa feature to suggest13 the insertion of a new vocabularyThis feature allows a user to check what informationthe LOV application can automatically detect and ex-tract LOV curators then check if the vocabulary fallsin the scope of LOV and if it meets LOV vocabularyquality requirements to be reused

1 a vocabulary should be written in RDF and bedereferenceable

2 a vocabulary should be parsable without error(warning tolerated)

3 all vocabulary terms (classes properties anddatatypes) in a vocabulary should have anrdfslabel

13httplovokfnorgdatasetlovsuggest

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 7

4 a vocabulary should refer to and reuse relevantexisting ones

5 a vocabulary should provide some metadata aboutthe vocabulary itself (a minima a title)

If a suggested vocabulary meets these criteria it is theninserted in the LOV catalogue During this processLOV curators keep the authors informed and help themto improve their vocabulary quality As a result of ourexperience in vocabulary publication we are now ableto publish a handbook about metadata recommenda-tions for Linked Open Data vocabularies to help inpublishing well documented vocabularies [27]

Author designs vocabulary

Author suggests a vocabulary

LOV Robot suggests a vocabulary

LOV Robot detects any reference to new

vocabs in LOV ones

Curator looks if thevocabulary is agood fit for LOV

Is a good fit

Curator looks if thevocabulary meets

LOV quality

Does meet quality

Curator inserts the vocabulary in LOV

Curator sendsan email tothe author

NO

NO

YES

YES

Start

End

Fig 8 Curation workflow for vocabulary insertion

322 Vocabulary ReviewWhen automatic extraction of metadata fails LOV

curators enhance the description available in the sys-tem and notify the vocabulary authors The documen-tation provided by the LOV application assists any userin the task of understanding the semantics of each vo-cabulary term and therefore of any data using it Forinstance information about the creator and publisheris a key indication for a vocabulary user in case helpor clarification is required from the author or to as-sess the stability of that artifact About 55 of vocab-

ularies specify at least one creator contributor or edi-tor We augment this information using manually gath-ered information leading to the inclusion of data aboutthe creator in over 85 of vocabularies in LOV Thedatabase stores every version of a vocabulary since itsfirst issue For each version a user can access the file(particularly useful when the original online file is nolonger available) As illustrated in figure 9 a script isin place to automatically check for vocabulary updateson a daily basis If a new version has been detected theversion is stored locally the statistics about that vocab-ulary recomputed and a notification to the curators is-sued Similarly we ensure that curated review for eachvocabulary is less than one year old by sending cura-tors a notification when a vocabulary review is olderthan eleven months In both case curators update thevocabulary review accordingly

LOV Robot checksif vocabulary

review is older than 11 months

Curator updatesvocabulary review

and metadata if needed

Has changed

LOV Robot add the new version and

update vocabularymetadata

NO

YES

LOV Robot checksdaily if vocabulary

has changed

End

LOV Robot notifiesthe curator

LOV Robot notifiesthe curator

Is reviewgt11m

End

NO

YES

Start End

Fig 9 Curation workflow for systematic vocabularyreview

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

8 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

33 Data Access

LOV system (code and data) is published under Cre-ative Commons 40 license14 (CC BY 40) Four meth-ods are offered for users and applications to access theLOV data 1) query the LOV search engine to find themost relevant vocabulary terms vocabularies or agentsmatching keywords andor filters 2) download datadumps of the LOV catalogue in RDF Notation 3 for-mat or the LOV catalogue and the latest version of eachvocabulary in RDF N-quads format 3) run SPARQLqueries on the LOV SPARQL Endpoint and 4) usethe LOV system Application Program Interface (API)which provides a full access to LOV data for softwareapplications

331 Search EngineIn [6] Butt et al compare eight different ranking

methods grouped in two categories for querying vo-cabulary terms

ndash Content-based Ranking Models tf-idf BM25Vector Space Model and Class Match Measure

ndash Graph-based Ranking Models PageRank Den-sity Measure Semantic Similarity Measure andBetweenness Measure

Based on their findings we defined a new rankingmethod adapting Term frequency inverse document fre-quency (tf-idf) to the graph-structure of vocabulariesWhen compared to the other methods tf-idf takes intoaccount the relevance and importance of a resource tothe query when assigning a weight to a particular vo-cabulary for a given query term We reuse the aug-mented frequency variation of term frequency formulato prevent a bias towards longer vocabularies Becauseof the inherit graph structure of vocabularies tf-idfneeds to be tailored so that instead of using a word asthe basic unit for measuring we are considering a vo-cabulary term t in a vocabulary V as the measuringunit The equation 1 presents the adaptation of tf-idf tovocabularies (a definition of the variables used in thispaperrsquos equations is provided in table 6)

tf(t V ) = 05 +05 lowast f(t V )

max f(ti V ) ti isin V

idf(tV) = logN

| V isin V t isin V |

(1)

14httpscreativecommonsorglicensesby40

Variable DescriptionV Set of VocabulariesV A vocabulary V isin VN Number of vocabularies in Vt A vocabulary term URI (class property

instance or datatype) t isin V t isin URIQ Query stringqi Query term i of QσV Set of matched URIs for Q in VσV (qi) Set of matched URIs for qi in V

forallti isin σV ti isin V ti matches qip A term predicate p isin URID Set of DatasetsD A Dataset D isin DM(ti) Number of Datasets D in D ti isin D

Table 6 Definition of the variables used in the equa-tions

As highlighted in [6] and [23] the notion of pop-ularity of a vocabulary term across the LOD datasetsset D is significantly important In equation 2 we in-troduce a popularity measure function of the normali-sation of the frequency f(tD) of a term URI t in theset of datasets D and the normalisation of the numberof datasets in which a term URI appearsM(t) t isin DBy using maximum in the normalisation we emphasisethe most used terms resulting in a consensus withinthe community This measure will give a higher scoreto terms that are often used in datasets and across alarge number of datasets

pop(tD) = f(tD)max f(tiD) ti isin D

lowast M(t)

max M(ti) ti isin D

(2)

When compared with RDF datasets best practicesabout vocabulary publication makes their structureconsensual and stable It becomes then intuitive to as-sign more importance to a vocabulary term matching aquery on the value of the property rdfslabel thandctermscomment The equation 3 extends thelucene based search engine elasticsearch inner field-length norm lengthNorm(field) which attaches ahigher weight to shorter fields by combining it with aproperty-level boost boost(p(t)) Using this property-level boost we can assign a different score depend-ing on which label property a query term matched We

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 9

distinguish four different label property categories onwhich a query term could match

ndash Local name (URI without the namespace) Whilea URI is not supposed to carry any meaning it is aconvention to use a compressed form of a term la-bel to construct the local name It becomes there-fore an important artifact for term matching forwhich the highest score will be assigned An ex-ample of local name matching the term ldquopersonrdquois httpschemaorgPerson

ndash Primary labels The highest score will also beassigned for matches on rdfslabel dcetitle dctermstitle skosprefLabelproperties An example of primary label match-ing the term ldquopersonrdquo is rdfslabel Per-sonen

ndash Secondary labels We define as secondary la-bel the following properties rdfscommentdcedescription dctermsdescriptionskosaltLabel A medium score is assignedfor matches on these properties An example ofsecondary label matching the term ldquopersonrdquo isdctermsdescription Examples of a Cre-ator include a person an organization or a ser-viceen

ndash Tertiary labels Finally all properties not fallingin the previous categories are considered as ter-tiary labels for which a low score is assigned Anexample of tertiary label matching the term ldquoper-sonrdquo is httpmetadataregistryorguriprofileRegApname Personen

norm(t V ) = lengthNorm(field)

lowastprodpisinV

boost(p(t))(3)

For every vocabulary in LOV terms (classes prop-erties datatypes instances) are indexed and a full textsearch feature is offered15 Human users or agents canfurther narrow a search by filtering on term type (classproperty datatype instance) language vocabulary do-main and vocabulary

The final score of t for a query Q (equation 4) is acombination of tf-idf the importance of label proper-ties of t on which query terms matched and the pop-ularity of that term in the LOD dataset While the

15httplovokfnorgdatasetlovterms

factorisation of tf-idf and field normalisation factor iscommon for search engine ranking16 we add a fourthparameter - the popularity - as it is fundamental in theSemantic Web Indeed the intention of LOV is to fos-ter the reuse of consensual vocabularies that becomede facto standards The popularity metric provides anindication on how widely a term is already used (in fre-quency and in number of datasets using it) We there-fore add this new factor specific to the Semantic Webto the scoring equation

Score(t Q) = tf(t V ) lowast idf(tV)

lowastnorm(t V ) lowast pop(tD)

forallt existqi isin Q t isin σV (qi)

(4)

332 Data DumpsThe system provides two data dumps one contain-

ing the LOV vocabulary catalogue only in RDF Nota-tion 3 format17 and another one containing the LOVcatalogue along with the latest version of each vo-cabulary and the statistics of use in LOD in RDF N-quads format18(keeping each vocabulary in a separatenamed graph) As illustrated in figure 10 the RDFmodel mainly reuses the Data CATalogue Vocabulary(DCAT) which allows the representation of the LOVcatalogue as a dcatCatalog composed of vocabu-lary entries (dcatCatalogRecord) capturing in-formation like the insertion date in LOV Each en-try point to the vocabulary itself is represented by asub class of dcatDataset defined in the Vocabu-lary Of A Friend (VOAF) This artifact contains meta-data extracted by the LOV application such as creatorsfirst issued date number of occurrences of the vocab-ulary in Linked Open Data Each vocabulary is thenlinked to its various published versions represented bythe dcatDistribution entity on which informa-tion such as inter vocabulary links or languages can befound

333 SPARQL EndpointThe LOV SPARQL Endpoint19 offers a complemen-

tary data access method and allows clients to posecomplex queries to the server and retrieve direct an-swers computed over the LOV dataset We use the Jena

16See elasticsearch documentation httpbitly1e37sFL

17httplovokfnorglovn3gz18httplovokfnorglovnqgz19httplovokfnorgdatasetlovsparql

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

10 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

dcatCatalog

dctermstitledctermsdescriptiondctermslicensedctermsmodified

dcatCatalogRecord

dctermstitledctermsissueddctermsmodified

vannpreferredNamespaceUrivannpreferredNamespacePrefixdctermstitledctermsdescriptiondcatkeyworddctermsissueddctermsmodifiedrdfsisDefinedByfoafhomepagedctermscreatordctermsconstributordctermspublisherdctermslanguagevoaf occurrencesInDatasetsvoaf reusedByDatasetsvoaf reusedByVocabularies

foafprimaryTopic

revReview

dctermsdatedctermscreatorrevtext

revhasReview

voafDatasetOccurrences

voafinDatasetvoafoccurrences

voafusageInDataset

dcatDistribution

dctermsissueddctermslanguagedctermstitlevoafclassNumbervoafpropertyNumbervoafdatatypeNumbervoafinstanceNumbervoafextendsvoafspecializesvoafgeneralizesvoafhasEquivalencesWithvoafhasDisjunctionsWithvoafmetadataVocowlimports

voafVocabulary

dcatDataset

dcatdistribution

dcatrecord

Fig 10 UML class diagram representation of LOV catalogue RDF schema model

fuseki triple store to store the N-quads file contain-ing the LOV catalogue and the latest version of eachvocabulary We believe this is the first service to al-low users to to query multiple vocabularies at the sametime and to detect inter-vocabulary dependencies

34 LOV Application Program Interfaces and UserInterfaces

LOV APIs give a remote access to the many func-tions of LOV through a set of RESTful services20The basic design requirements for these APIs is thatthey should allow applications to get access to the verysame information humans do via the User InterfacesMore precisely the APIs give access through three dif-ferent services (cf figure 11) to functions related to

ndash Vocabulary terms (classes properties datatypesand instances) With these functions a softwareapplication can query the LOV search engine

20httplovokfnorgdatasetlovapidoc

ask for auto-completion or a suggestion for mis-spelled terms

ndash Vocabularies A client can get access to the cur-rent list of vocabularies contained in the LOVcatalogue search for vocabularies get auto-completion or obtain all details about a vocabu-lary

ndash Agents This provides a software agent with a listof all agents references in the LOV catalogue ameans to search for an agent get auto-completionand details about an agent

LOV APIs are a convenient means to access the fullfunctionality and data of LOV It is particularly ap-propriate for dynamic Web applications using script-ing languages such as Javascript The APIs describedabove have been developed for and follow the require-ments of Ontology Design and Data Publication tools

The LOV Website offers intuitive navigation withinthe vocabularies catalogue It allows users to explorevocabularies vocabulary terms agents and languagesand to see the connections between these entities Forinstance a user can look for experts in geography and

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 11

Fig 11 List of APIs to access LOV data

geometry domains21 We use d322 javascript library todisplay charts and make the navigation more interac-tive for example we use the star graph representationto display incoming and outgoing links between vo-cabularies (cf figure 12)

Fig 12 Schemaorg vocabulary incoming and outgo-ing links graphical representation as displayed in theUI

21httplovokfnorgdatasetlovagentsamptag=GeographyGeometry

22httpd3jsorg

4 LOV as a support for Data Publication andOntology Engineering

LOV can be used in any methodology for the cre-ation and reuse of ontologies One of the most maturemethodologies for supporting ontology development isNeOn Scenario-based it supports the collaborative as-pects of ontology development and reuse as well as thedynamic evolution of ontology networks in distributedenvironments [26]

Based on the NeOn Methodologyrsquos glossary of ac-tivities for building ontologies the LOV system is rel-evant in three activities

Ontology Search LOVrsquos primary feature is the searchof vocabulary terms These vocabularies are cate-gorised according to the domain they address Inthis way the LOV system contributes to ontologysearch by means of (a) keyword search and (b)domain browsing

Ontology Assessment LOV provides a score for eachterm retrieved by a keyword search This scorecan be used during the assessment stage

Ontology Mapping In LOV vocabularies rely oneach other in seven different ways These rela-tionships are explicitly stated using the VOAF vo-cabulary This data could be useful to find align-ments between ontologies for example one usermight be interested in finding equivalent classesfor a given class or all equivalent classes andequivalent properties among two ontologies List-ing 2 shows the SPARQL query to retrieve allthe equivalent classes between the vocabulariesfoaf and schemaorg23 Table 7 shows thealignments between foaf and schemaorg vocabu-laries

elem1 alignment elem2foafDocument owlequivalentClass schemaCreativeWorkfoafImage owlequivalentClass schemaImageObjectfoafPerson owlequivalentClass schemaPerson

Table 7 Equivalent classes and properties betweenfoaf and schemaorg

23The reader can run the query on LOV Endpoint httpbitly1Lqybcu

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

12 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Listing 2 SPARQL query asking for all the equivalentclasses and properties between the vocabularies foafand schema

1 SELECT DISTINCT elem1 alignment elem22 GRAPH g3 elem1 rdfsisDefinedBy4 lthttpxmlnscomfoaf01gt5 elem1 owlequivalentClass elem26 UNION elem2 owlequivalentClass elem17 FILTER(STRSTARTS(STR(elem2)8 httpschemaorg))9 elem1 alignment elem2

10

5 LOV Adoption

LOV supports the emergence of a rich applicationecosystem thanks to its various data access methodsWe list below some tools using our system as part oftheir service and project

51 Derived tools and applications

Maguire et al [17] use LOV search API to imple-ment OntoMaton24 a widget for bringing together on-tology lookup and tagging within the collaborative en-vironment provided by Google spreadsheets

YASGUI (Yet Another SPARQL Query GUI)25 is aclient-side JavaScript SPARQL query editor that usesthe LOV API for property and class auto-completiontogether with httpprefixcc for namespaceprefix auto-completion [22] YASGUI is itself reusedby LOV for its SPARQL Endpoint User Interface

The Datalift26 platform [24] a framework for ldquolift-ingrdquo raw data into RDF comes with a module tomap data objects and properties to ontology classesand predicates available in the LOV catalogue TheData2Ontology module takes as input a ldquoraw RDFrdquoa dataset that has been converted directly from legacyformat to triples The goal is to help publishers reuseexisting ontologies for converting their dataset witheasy discovery and interlinking

OntoWiki27 facilitates the visual presentation of aknowledge base as an information map with differentviews on instance data [3] It enables intuitive author-ing of semantic content with an inline editing mode

24httpsgithubcomISA-toolsOntoMaton25httplegacyyasguiorg26httpdataliftorg27httpontowikinet

for editing RDF content similar to WYSIWIG fortext documents OntoWiki offers a vocabulary selec-tion feature based on LOV

Furthermore we can mention the ProteacutegeacuteLOV28 aplug-in for the Proteacutegeacute editor tool [13] that aims atimproving the development of lightweight ontologiesby reusing existing vocabularies at a low fine grainedlevel The tool searches for a term in LOV via APIsand provides three actions if the term exists (i) re-places the selected term in the current ontology (ii)adds the rdfssubClassOf axiom and (iv) addsthe owlequivalentClass

52 Using LOV as a Research platform

LOV has served as the object of studies in [18]where Poveda-Villaloacuten et al analysed trends in ontol-ogy reuse methods In addition the LOV dataset hasbeen used in order to analyse the occurrence of goodand bad practices in vocabularies [19]

Prefixes in the LOV dataset are regularly mappedwith namespaces in the prefixcc service In [2] the au-thors perform alignments of Qnames of vocabularies inboth services and provide different solutions to handleclashes and disagreements between preferred names-paces Both LOV and prefixcc provide associationsbetween prefixes and namespaces but follow a differ-ent logic The prefixcc service supports polysemy andsynonymy and has a very loose control on its crowd-sourced information In contrast LOV has a muchmore strict policy forbidding polysemy and synonymyensuring that each vocabulary in the LOV database isuniquely identified by a unique prefix identification al-lowing the usage of prefixes in various LOV publica-tion URIs

The LOV query log covering the period between2012-01-06 and 2014-04-16 has been used in [6] tobuild a benchmark suite for ontology search and rank-ing The CBRBench29 benchmark uses eight rankingmodels of resources in ontologies and compares theresults with ontology engineers Our vocabulary termranking method relies on and extends the outcome ofthis work

In [15] the authors provide a 5 star rating for RDFvocabulary publication to foster interoperability queryfederation and better interpretation of data on the Websimilar o the 5 stars rating for Linked Open DataBased on LOV insertion criteria all vocabularies must

28httplabsmondecacomprotolov29httpszenodoorgrecord11121

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 13

be 5 stars using this ranking and must provide furtherquality attributes imposed by LOV to facilitate vocab-ulary reuse

RDFUnit30 is a test-driven data debugging frame-work for the Web of data In [16] the authors pro-vide an automatic test case for all available schemaregistered with LOV Vocabularies are used to encodesemantics to domain specific knowledge to check thequality of data

Governatori et al [14] analyse the current use of li-censes in vocabularies on the Web based on the LOVcatalogue in order to propose a framework to detectincompatibilities between datasets and vocabularies

6 Related work

Reusing vocabularies requires searching for termsin existing specialised vocabulary catalogues or searchengines on the Web While we refer the reader to [9]for a systematic survey of ontology repositories welist below some existing catalogues relevant to findingvocabularies

ndash Catalogues of generic vocabulariesschemas sim-ilar to LOV catalogue Example of cataloguesfalling in this category are vocaborg31 ontologies32JoinUp Semantic Assets or the Open MetadataRegistry

ndash Catalogues of ontologies for a specific domainsuch as biomedicine with the BioPortal [28]geospatial ontologies with SOCoP+OOR33 Ma-rine Metadata Interoperability and the SWEET[21] ontologies34 The SWEET ontologies in-clude several thousand terms spanning a broadextent of Earth system science and related con-cepts (such as data characteristics) with thesearch tool to aid finding science data resources

ndash Catalogues of ontology Design Patterns (ODP)focused on reusable patterns in ontology engi-neering [20] The submitted patterns are smallpieces of vocabularies that can further be inte-grated or linked with other vocabularies ODPdoes not provide a search function for specificterms as is the case with Swoogle or Watson

30httpsgithubcomAKSWRDFUnit31httpvocaborg32httpontologies33httpsontohuborgsocop34httpsweetjplnasagov

ndash Search Engines of ontology terms Among ontol-ogy search engines we can cite Swoogle [12]Watson [810] and FalconS [7] These search en-gines crawl for data schema from RDF documentson the Web They offer filtering based on ontol-ogy type (Class Property) and a ranking based onthe popularity They donrsquot look for ontology re-lations nor check if the definition of the ontologyis available (usually known as dereferenciation)While in Swoogle the ranking score is displayedWatson shows the language of the resource andthe size However none of these services provideany relationship between the related ontologiesnor any domain classification of the vocabular-ies Table 8 presents a summary of key featuresof LOV with respect to Swoogle Watson and Fal-cons

ndash Datasets and Vocabularies statistics One of themajor project on which LOV relies LODStats[11] makes a bridge between datasets and vocab-ularies LODStats gathers up to 32 different sta-tistical criteria based on a statement-stream-basedapproach for RDF datasets in Datahub35 LOD-Stats maintains a comprehensive statistics on vo-cabularies terms (ie classes properties) definedand used in a dataset Schmachtenberg et al [25]present a survey based on a large-scale LinkedData crawl from March 2014 to analyse the differ-ences in best practices adoption across differentapplication domains Their results concerning themost used vocabularies (eg foaf dcterms skosetc) and the adoption of well-known vocabulariesare inline with the findings of this paper

7 Discussion and Future Work

With many users both humans and applications andseveral years of development LOV is a mature systemthat offers a wide range of services for RDF vocabularyreuse

While LOV is delivering quality vocabularies itpresents some limitations

ndash LOV focuses on a subset of vocabularies for thedescription of RDF data It does not include anyValue Vocabularies such as SKOS thesaurus thatwould benefit from LOV services to encouragetheir reuse This limitation is due to the rather

35httpdatahubio

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

14 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Feature Swoogle Watson Falcons LOVBrowsing ontologies Yes Yes Yes YesOntology discovery method Automatic Automatic Automatic Automatic and manualScope SWDs SWDs Concepts VocabulariesRanking LOD popularity LOD popularity LOD popularity LODLOV popularity

+ labelrsquos property typeDomain filtering No No No YesComments and review No Yes No Only by curatorsWeb service access Yes Yes Yes YesSPARQL endpoint No No No YesReadWrite Read Read amp Write Read ReadOntology directory No No No YesApplication platform No No No YesStorage Cache - - Dump amp endpointInteraction with contributors No - No Yes

Table 8 Comparison of LOV with respect to Swoogle Watson and Falcons adapted from the framework presentedby drsquoAquin and Noy [9] SWD stands for Semantic Web Document

small team of curators (4 at the date of writing thispaper) Although we automated all the processesand analyses we could the support to vocabularyauthors is far from negligible

ndash LOV relies on third projects to get the valuableinformation of vocabulary usage in publisheddatasets At the moment the popularity informa-tion coming from LODStats does not take into ac-count the most recent interest (eg Schemaorg)in publishing RDF data using markup languageAs a consequence the popularity measure is in-complete and does not represent all possible useof a vocabulary

From the study of LOV as a dynamic ecosystem wecan draw the following lessons learned

ndash There is a need for vocabularies to support morelanguages Labels are the main entry point to avocabulary and their associated language is thekey Only 15 of LOV vocabularies make use ofmore than one language Multilingualism is im-portant at least for two reasons 1) the most ob-vious one is allowing users to search query andnavigate vocabularies in their native languageand 2) translating is a process through which thequality of a vocabulary can only improve Look-ing at a vocabulary through the eyes of other lan-guages and identifying the difficulties of transla-tion helps to better outline the initial concepts andif necessary refine or revise them Hence multi-lingualism and translation should be native built-

in features of any vocabulary construction not amarginal task

ndash There is at the moment no solution for long-termvocabulary preservation on the Web [4] This isa particularly important problem in a distributedand uncontrolled environment where any individ-ual can create and publish a vocabulary Thirdparties can reuse such vocabularies and thereforecreate a dependency on the original vocabularyavailability as it retains the semantics of the dataThis issue weakens the Semantic Web founda-tions

In the future we see in particular the following di-rections for advancing the LOV initiative

ndash An area that is still largely unexplored is multi-term vocabulary search During the ontology de-sign process it is common to have more than 20concepts represented using existing vocabulariesor a new one in case there is no correspondingartifact While we are able to search for relevantterms in LOV it is still the responsibility of theontology designer to understand the complex re-lationships between all these terms and come upwith a coherent ontology We could use the net-work of vocabularies defined in LOV to suggestnot only a list of terms but graphs to representseveral concepts together

ndash We would like to provide more vocabulary basedservices such as vocabulary matching to help theauthors adding more relationships to other re-

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web 15

lated vocabularies Vocabulary checking is an-other service the community is asking for Wecould integrate useful applications to LOV suchas Vapour36 RDF Triple-Checker37 or OOPS38

ndash Another research perspective is SPARQL queryextension and rewriting based on Linked Vocab-ularies Using the inter-vocabulary relationshipswe could transform the query to use the samesemantic (same vocabulary terms) as the datasource(s) to query

8 Conclusion and Future work

In this system report we presented an overview ofthe Linked Open Vocabularies initiative a high qual-ity catalogue of reusable vocabularies for the descrip-tion of data on the Web The importance of this work ismotivated by the difficulty for data publishers to deter-mine which vocabularies to use to describe their dataThe key innovations described in this article include1) the availability of a high quality vocabularies datasetthrough multiple accessing methods 2) the curation byexperts making explicit for the first time the relation-ships between vocabularies and their version historyand 3) the consideration of property semantic in termsearch scoring

The adoption and integration of the LOV cataloguein applications for vocabulary engineering reuse anddata quality are significant LOV has a central role invocabulary life-cycle on the Web of data as highlightedby the W3C39

Acknowledgments

This work has been partially supported by theFrench National Research Agency (ANR) within theDatalift Project under grant number ANR-10-CORD-009 the Spanish project BabelData (TIN2010-17550)and Fujitsu Laboratories Limited The Linked OpenVocabularies initiative is graciously hosted by theOpen Knowledge Foundation We would like to thankall the members of LOV community all the editorsand publishers of vocabularies who trust in LOV cata-logue A special thank to Phil Archer for proofreadingthis paper

36httpvalidatorlinkeddataorgvapour37httpgraphiteecssotonacukchecker38httpoopslinkeddataes39httpwwww3org2013data

References

[1] Keith Alexander and Michael Hausenblas Describing linkeddatasets-on the design and usage of void the vocabulary of in-terlinked datasets In In Linked Data on the Web Workshop(LDOW 09) in conjunction with 18th International World WideWeb Conference (WWW 09 Citeseer 2009

[2] Ghislain Auguste Atemezing Bernard Vatant Raphaeumll Troncyand Pierre-Yves Vandenbussche Harmonizing services forLOD vocabularies a Case Study In Proceedings of the Work-shop on Semantic Web Enterprise Adoption and Best PracticeCo-located with 12th International Semantic Web Conference(ISWC 2013) Sydney Australia October 22 2013 SydneyAustralia 2013

[3] Soumlren Auer Sebastian Dietzold and Thomas RiechertOntowikindasha tool for social semantic collaboration In The Se-mantic Web-ISWC 2006 pages 736ndash749 Springer 2006

[4] Thomas Baker Pierre-Yves Vandenbussche and BernardVatant Requirements for vocabulary preservation and gover-nance Library Hi Tech 31(4)657ndash668 2013

[5] Tim Berners-lee Linked data design issues for the world wideweb World Wide Web Consortium httpwww w3 orgDesig-nIssuesLinkedData html 2006

[6] Anila Sahar Butt Armin Haller and Lexing Xie Ontologysearch An empirical evaluation In The Semantic WebndashISWC2014 pages 130ndash147 Springer 2014

[7] Gong Cheng Weiyi Ge and Yuzhong Qu Falcons searchingand browsing entities on the semantic web In Proceedings ofthe 17th international conference on World Wide Web pages1101ndash1102 ACM 2008

[8] Mathieu drsquoAquin Claudio Baldassarre Laurian GridinocMarta Sabou Sofia Angeletou and Enrico Motta WatsonSupporting next generation semantic web applications 2007

[9] Mathieu drsquoAquin and Natasha F Noy Where to publish andfind ontologies a survey of ontology libraries Web SemanticsScience Services and Agents on the World Wide Web 11(0)96ndash 111 2012

[10] Mathieu drsquoAquin Marta Sabou Martin Dzbor Claudio Bal-dassarre Laurian Gridinoc Sofia Angeletou and EnricoMotta Watson A gateway for the semantic web In Poster ses-sion of the European Semantic Web Conference ESWC 2007

[11] Jan Demter Soumlren Auer Michael Martin and Jens LehmannLODStats ndash An Extensible Framework for High-performanceDataset Analytics In 18th International Conferenceon Knowledge Engineering and Knowledge Management(EKAW) pages 353ndash362 Galway Ireland 2012

[12] Timothy W Finin Li Ding Rong Pan Anupam Joshi PranamKolari Akshay Java and Yun Peng Swoogle Searching forknowledge on the semantic web In Proceedings The Twenti-eth National Conference on Artificial Intelligence and the Sev-enteenth Innovative Applications of Artificial Intelligence Con-ference July 9-13 2005 Pittsburgh Pennsylvania USA pages1682ndash1683 2005

[13] Nuria Garciacutea-Santa Ghislain Auguste Atemezing and BorisVillazoacuten-Terrazas The proteacutegeacute LOVPlugin Ontology Accessand Reuse for Everyone In The 12th Extented Semantic WebConference (ESWC2015)

[14] Guido Governatori Ho-Pun Lam Antonino Rotolo SerenaVillata Ghislain Atemezing and Fabien Gandon Checking li-censes compatibility between vocabularies and data In Pro-ceedings of the 5th International Workshop on Consuming

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011

16 Pierre-Yves V et al LOV a gateway to reusable semantic vocabularies on the Web

Linked Data (COLD 2014) co-located with the 13th Interna-tional Semantic Web Conference (ISWC 2014) Riva del GardaItaly October 20 2014 2014

[15] Krzysztof Janowicz Pascal Hitzler Benjamin Adams DaveKolas and Charles Vardeman II Five stars of linked data vo-cabulary use Semantic Web 5(3)173ndash176 2014

[16] Dimitris Kontokostas Patrick Westphal Soumlren Auer Sebas-tian Hellmann Jens Lehmann Roland Cornelissen and Am-rapali Zaveri Test-driven evaluation of linked data qualityIn Proceedings of the 23rd International Conference on WorldWide Web WWW rsquo14 pages 747ndash758 Republic and Cantonof Geneva Switzerland 2014 International World Wide WebConferences Steering Committee

[17] Eamonn Maguire Alejandra Gonzaacutelez Beltraacuten Patricia LWhetzel Susanna-Assunta Sansone and Philippe Rocca-Serra Ontomaton a bioportal powered ontology widget forgoogle spreadsheets Bioinformatics 29(4)525ndash527 2013

[18] Mariacutea Poveda-Villaloacuten Mari Carmen Suaacuterez-Figueroa andAsuncioacuten Goacutemez-Peacuterez The landscape of ontology reuse inlinked data 2012

[19] Mariacutea Poveda-Villaloacuten Bernard Vatant Mariacutea Carmen Suaacuterez-Figueroa and Asuncioacuten Goacutemez-Peacuterez Detecting good prac-tices and pitfalls when publishing vocabularies on the web2013

[20] Valentina Presutti and Aldo Gangemi Content ontology de-sign patterns as practical building blocks for web ontologiesIn Spaccapietra S et al editor ER rsquo08 Proceedings of the27th International Conference on Conceptual Modeling pages128ndash141 Springer-Verlag 2008

[21] Robert G Raskin and Michael J Pan Knowledge representa-tion in the semantic web for earth and environmental terminol-ogy (sweet) Computers amp Geosciences 31(9)1119 ndash 1125

2005[22] Laurens Rietveld and Rinke Hoekstra Yasgui Not just an-

other sparql client In The Semantic Web ESWC 2013 SatelliteEvents pages 78ndash86 Springer 2013

[23] Johann Schaible Thomas Gottron Stefan Scheglmann andAnsgar Scherp Lover support for modeling data using linkedopen vocabularies In Proceedings of the Joint EDBTICDT2013 Workshops pages 89ndash92 ACM 2013

[24] Franccedilois Scharffe Ghislain Atemezing Raphaeumll Troncy Fa-bien Gandon Serena Villata Beacuteneacutedicte Bucher Fayccedilal HamdiLaurent Bihanic Gabriel Keacutepeacuteklian Franck Cotton JeacuterocircmeEuzenat Zhengjie Fan Pierre-Yves Vandenbussche andBernard Vatant Enabling linked-data publication with thedatalift platform In 26th Conference on Artificial Intelligence(AAAI-12) 2012

[25] Max Schmachtenberg Christian Bizer and Heiko PaulheimAdoption of the linked data best practices in different topicaldomains In The Semantic WebndashISWC 2014 pages 245ndash260Springer 2014

[26] Maria del Carmen Suaacuterez-Figueroa NeOn Methodology forBuilding Ontology Networks Specification Scheduling andReuse PhD thesis Universidad Politecnica de Madrid SpainJune 2010 httpoaupmes3879

[27] Pierre-Yves Vandenbussche and Bernard Vatant Metadata rec-ommendations for linked open data vocabularies Technicalreport 2012

[28] Patricia L Whetzel Natalya F Noy Nigam H Shah Paul RAlexander Csongor Nyulas Tania Tudorache and Mark AMusen Bioportal enhanced functionality via new web ser-vices from the national center for biomedical ontology to ac-cess and use ontologies in software applications Nucleic acidsresearch 39(suppl 2)W541ndashW545 2011