Upload
rinke-hoekstra
View
110
Download
0
Embed Size (px)
Citation preview
Managing Metadata for Science, Technology and Innovation Studies: The RISIS Case
Al Koudous Idrissou, Ali Khalili, Rinke Hoekstra and Peter van den BesselaarVrije Universiteit Amsterdam/University of [email protected]
• Started in January 2014 • 4 years • 13 partners from 10 countries
• 7 universities • 6 public research organizations
• Goal: promote a distributed research infrastructure to advance science & innovation studies
About RISIS
1
• Started in January 2014 • 4 years • 13 partners from 10 countries
• 7 universities • 6 public research organizations
• Goal: promote a distributed research infrastructure to advance science & innovation studies
About RISIS
1
• Started in January 2014 • 4 years • 13 partners from 10 countries
• 7 universities • 6 public research organizations
• Goal: promote a distributed research infrastructure to advance science & innovation studies
About RISIS
1
Science & Technology Studies
• Study the dynamics of scientific ideas.
• Interaction between academia, business and government.
• Highly interdisciplinarysocial sciences, economics, political science, humanities
• Highly heterogeneous datastructured vs. unstructuredqualitative vs. quantitative
The RISIS Project
• "an explosion of experimental datasets since 2000 … mostly thanks to EC supported project"
• A distributed research infrastructure to advance science & innovation studies
• Serving research: consolidate and integrate existing datasetscomplement with new datasets on key issues currently not covered develop software platforms to support research (extract, integrate, structure and treat semantic web data)
• Serving society:A radically improved evidence base for research & innovation policies
Six Use Cases
1. Where do what types of firms innovate, how do they develop, where do they grow fastest?
2. How stable and large are EU-promoted networks? How do joint funding and emerging science & technologies affect Europe?
3. What is the quality and extent of public sector research? Build registers at a European level, integrated views of excellence (leiden ranking etc.)
4. Track the careers of researchers across borders
5. Effect and impact of research & innovation studies
6. Develop integrated data and tools for researchers in the field
Six Use Cases
1. Where do what types of firms innovate, how do they develop, where do they grow fastest?
2. How stable and large are EU-promoted networks? How do joint funding and emerging science & technologies affect Europe?
3. What is the quality and extent of public sector research? Build registers at a European level, integrated views of excellence (leiden ranking etc.)
4. Track the careers of researchers across borders
5. Effect and impact of research & innovation studies
6. Develop integrated data and tools for researchers in the field
The Types of Data in SMSData Integration
Organization Product Agreement
Person PolicyPolicy
Evaluation Location
CIB ETER EUPRO JOREP Leiden-Ranking
MORE I Nano Profile SIPER VICO
Higher Education
Firm Funding Body
Publication
Patent
Project
Investment
Funding Program
6
Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOID
RDFVOID
RDFVOID
RDFVOID
[Linked Data] API
Data Cache(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity Recognition
[Linked] Open Data
Public Data Access Methods(SPARQL, API, RSS,…)
Meta-dataServices
Basic Geo Services
Innovative Geo Services
Integration withlocal datasets
Integration withpublic datasets
CategoryServices
Apps
Integration withsocial data
Access Control Service
Domain Adaptation
Service
Identifier Management
ServiceIdentity Resolution Service
VOID
But wait a moment… hasn't this been done before?
• … solve a similar data integration problempharma (OpenPHACTS), socio-economic history, linguistics, media (CLARIAH, CEDAR, etc.).
• … solve a similar data search, indexing and cataloguing problemdatahub.io, lodlaundromat.org
• … solve similar metadata representation problemsDCAT, VOID, etc.
data privacy
data privacy data licensing
data privacy data licensing
data paywall
data privacy data licensing
data paywallphysical location
Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOIDVOIDVOID
SMS [Linked Data] API
Data Cache(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity Recognition
[Linked] Open Data
Meta-dataServices
Basic Geo Services
Innovative Geo Services
Integration withlocal datasets
Integration withpublic datasets
CategoryServices
Apps
Integration withsocial data
Domain Adaptation
Service
Identifier Management
Service
Identity Resolution Service
Access Control Points
RDFmetadata VOIDVOID
RDFstoreconvert convert
metadata metadata
RDFmetadata
convert
Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOIDVOIDVOID
SMS [Linked Data] API
Data Cache(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity Recognition
[Linked] Open Data
Meta-dataServices
Basic Geo Services
Innovative Geo Services
Integration withlocal datasets
Integration withpublic datasets
CategoryServices
Apps
Integration withsocial data
Domain Adaptation
Service
Identifier Management
Service
Identity Resolution Service
Access Control Points
RDFmetadata VOIDVOID
RDFstoreconvert convert
metadata metadata
RDFmetadata
convert
How can we still provide an integrated view on this data?
Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOIDVOIDVOID
SMS [Linked Data] API
Data Cache(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity Recognition
[Linked] Open Data
Meta-dataServices
Basic Geo Services
Innovative Geo Services
Integration withlocal datasets
Integration withpublic datasets
CategoryServices
Apps
Integration withsocial data
Domain Adaptation
Service
Identifier Management
Service
Identity Resolution Service
Access Control Points
RDFmetadata VOIDVOID
RDFstoreconvert convert
metadata metadata
RDFmetadata
convert
How can we still provide an integrated view on this data?
Do existing vocabularies suffice?
How do experts assess the suitability of a dataset?
• Knowledge acquisition & elicitation with expertsinterviews -> first design -> user experiences -> revise & adapt
• Distinguish between private, publicly accessible, and other public data.
How do experts assess the suitability of a dataset?
• Knowledge acquisition & elicitation with expertsinterviews -> first design -> user experiences -> revise & adapt
• Distinguish between private, publicly accessible, and other public data.
1. User friendly web interface for viewing dataset metadata; 2. Show conditions under which the data can be used; 3. Provide detailed information about the dataset, to 4. Enable users to gain an in depth understanding of the data; 5. Facilitate trust (quality assessment); 6. Allow for both simple and advanced search (background knowledge)
Operationalisation
1. User interfacecategorisation of different types of metadata, non-technical terms, hints
2. Usage conditions legal aspects, access conditions, but also technical (data format, size, model)
3. Information & Understandingoverview, content description, temporal aspects, structure of the data
4. Trustprovenance and origin of data, when, how, and by whom it was created
5. SearchAll of the above + the use of external knowledge sources to show connections
Operationalisation
1. User interfacecategorisation of different types of metadata, non-technical terms, hints
2. Usage conditions legal aspects, access conditions, but also technical (data format, size, model)
3. Information & Understandingoverview, content description, temporal aspects, structure of the data
4. Trustprovenance and origin of data, when, how, and by whom it was created
5. SearchAll of the above + the use of external knowledge sources to show connections
Fig. 2. RISIS metadata coverage overview through knowledge type categorization.
Technical aspects. RISIS's metadata provides information about the datasetmodel used. This informs on whether the dataset follows the traditional tab-ular model (Relational, Spreadsheet, etc.) or the graph model (RDF). It alsocovers other information such as the format and the size of the dataset.
Legal aspects. The legal aspects of a dataset is covered by the RISIS metadatathrough a license which explicitly determines the terms under which a datasetcan be used, rights which provides information such as property and intellectualrights associated with the data, terms of use which describe non-binding con-ditions, access conditions and visit conditions which respectively describe theconditions in which end-users can access or visit a dataset and, non-disclosureagreement which specifies conditions of access to confidential information whichwould need signing a non-disclosure agreement with the dataset holder(s).
Access. To inform the data consumer on how to access or query the data, themetadata provides information such as the opening status which notifies whetherthe data is open for visit, access type which specified whether the data can bevisited, requested or both or whether the data is access free. In addition, itprovides access URL which is information on the landing page, feed, SPARQLendpoint or other type of resource that grant access to the distribution of thedataset and, the data download address which is information on the location ofthe dataset for download.
Data quality. All the above information could be used to assess the quality ofa dataset. However, to specifically assess the work done by dataset providers
In more detail
and generic domain of the problem, SMS is intended to be useful not only forSTIS but also for the humanities and social sciences.
5 Conclusions & Future Work
This paper presents an approach for managing metadata in the field of science,technology and innovation studies. The approach was developed and applied inthe context of the RISIS-SMS project with the goal of supporting data integra-tion, discovery and search across datasets, maintaining privacy, and obtaininguser trust while focussing on data that are not directly accessible. A contribu-tion of this work is the requirements elicited by interviewing the stakeholders.The requirement analysis guided the design of a new vocabulary, together withreview of existing metadata vocabularies that helped us filling in part of themetadata needed to accommodate the domain needs. Additionally, to meet therequirements, we designed and implemented a user-friendly interface which al-lows non-expert users to easily author metadata in RDF.
As future work, we envisage to extend our vocabulary to cover aspects relatedto the quality and provenance of data. We also plan to conduct a usabilityevaluation with end-users of the system to ensure that our user interface andmetadata specifications fulfil the user needs.
References
1. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A. J. Gray, C. Goble, and T. Clark.Pav ontology: provenance, authoring and versioning. Journal of biomedical seman-
tics, 4(1):1–22, 2013.2. C. Daraio, M. Lenzerini, C. Leporelli, H. F. Moed, P. Naggar, A. Bonaccorsi, and
A. Bartolucci. Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics, pages 1–15, 2015.
3. P. Groth, A. Loizou, A. J. Gray, C. Goble, L. Harland, and S. Pettifer. Api-centriclinked data integration: The open phacts discovery platform case study. Web Se-
mantics: Science, Services and Agents on the World Wide Web, 29:12–18, 2014.4. E. J. Hackett, O. Amsterdamska, M. Lynch, and J. Wajcman. The handbook of
science and technology studies. The MIT Press, 2008.5. A. Khalili, A. Loizou, and F. van Harmelen. Adaptive linked data-driven web
components: Building flexible and reusable semantic web interfaces. Semantic Web
Conference (ESWC) 2016, 2016.6. J. P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodrıguez-Doncel, and
P. Cimiano. One ontology to bind them all: The meta-share owl ontology for theinteroperability of linguistic datasets on the web. In The Semantic Web: ESWC
2015 Satellite Events, pages 271–282. Springer, 2015.7. A. Merono-Penuela, A. Ashkpour, M. Van Erp, K. Mandemakers, L. Breure,
A. Scharnhorst, S. Schlobach, and F. Van Harmelen. Semantic technologies forhistorical research: A survey. Semantic Web, 6(6):539–564, 2014.
8. P. Van den Besselaar. The cognitive and the social structure of sts. Scientometrics,51(2):441–460, 2001.
Fig. 4. The RISIS Ontology and the vocabularies it reuses.
for RDF data”. Figure 3 illustrates the mapping between the RISIS require-ment and existing shared vocabularies. Yet, reusing all the above vocabulariesdoes not entirely satisfy the RISIS’s need for describing a dataset. This forcedthe creation of new vocabularies such risis:usecase or risis:accessConditions (seeFigure 3) for concepts that are not covered by any of the selected vocabularies.
6 User-friendly authoring of metadata
As already mentioned in Section 2, the RISIS metadata about datasets is mod-eled in RDF. Resource Description Framework allows metadata to be shared and,facilitates integration in a structured and semantically machine interpretable wayacross di↵erent applications exploiting the metadata. The adoption of RDF as adata model for the RISIS project triggered the problem that, the auto-generatedmetadata stored need to be manipulated by data-owners who are not familiarwith the Semantic Web technologies and the ways to generate a standard andvalid RDF. In order to tackle this issue, we created a graphical user interface(UI) to enable RISIS non-expert data-owner users to generate and update theirdataset metadata .
To design the RISIS graphical UI for handling RDF metadata editor, wefollowed a user-centered approach where we first collected the UI requirementsby interviewing the potential end-users of the RISIS’s platform. We summarizehere the set of features which needed to be supported by the metadata editor:(1) render metadata properties in di↵erent categories (2) avoid presenting tothe user technical metadata properties (e.g. RDF dump, byte Size) (3) supportmetadata properties with hint to understand the meaning of the property (4)support the user with human readable information by avoiding displaying fullURI for example (5) facilitate inserting metadata values which follow a certainpattern (e.g. DataTime values, URLs, etc.)
In more detail
and generic domain of the problem, SMS is intended to be useful not only forSTIS but also for the humanities and social sciences.
5 Conclusions & Future Work
This paper presents an approach for managing metadata in the field of science,technology and innovation studies. The approach was developed and applied inthe context of the RISIS-SMS project with the goal of supporting data integra-tion, discovery and search across datasets, maintaining privacy, and obtaininguser trust while focussing on data that are not directly accessible. A contribu-tion of this work is the requirements elicited by interviewing the stakeholders.The requirement analysis guided the design of a new vocabulary, together withreview of existing metadata vocabularies that helped us filling in part of themetadata needed to accommodate the domain needs. Additionally, to meet therequirements, we designed and implemented a user-friendly interface which al-lows non-expert users to easily author metadata in RDF.
As future work, we envisage to extend our vocabulary to cover aspects relatedto the quality and provenance of data. We also plan to conduct a usabilityevaluation with end-users of the system to ensure that our user interface andmetadata specifications fulfil the user needs.
References
1. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A. J. Gray, C. Goble, and T. Clark.Pav ontology: provenance, authoring and versioning. Journal of biomedical seman-
tics, 4(1):1–22, 2013.2. C. Daraio, M. Lenzerini, C. Leporelli, H. F. Moed, P. Naggar, A. Bonaccorsi, and
A. Bartolucci. Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics, pages 1–15, 2015.
3. P. Groth, A. Loizou, A. J. Gray, C. Goble, L. Harland, and S. Pettifer. Api-centriclinked data integration: The open phacts discovery platform case study. Web Se-
mantics: Science, Services and Agents on the World Wide Web, 29:12–18, 2014.4. E. J. Hackett, O. Amsterdamska, M. Lynch, and J. Wajcman. The handbook of
science and technology studies. The MIT Press, 2008.5. A. Khalili, A. Loizou, and F. van Harmelen. Adaptive linked data-driven web
components: Building flexible and reusable semantic web interfaces. Semantic Web
Conference (ESWC) 2016, 2016.6. J. P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodrıguez-Doncel, and
P. Cimiano. One ontology to bind them all: The meta-share owl ontology for theinteroperability of linguistic datasets on the web. In The Semantic Web: ESWC
2015 Satellite Events, pages 271–282. Springer, 2015.7. A. Merono-Penuela, A. Ashkpour, M. Van Erp, K. Mandemakers, L. Breure,
A. Scharnhorst, S. Schlobach, and F. Van Harmelen. Semantic technologies forhistorical research: A survey. Semantic Web, 6(6):539–564, 2014.
8. P. Van den Besselaar. The cognitive and the social structure of sts. Scientometrics,51(2):441–460, 2001.
Fig. 4. The RISIS Ontology and the vocabularies it reuses.
for RDF data”. Figure 3 illustrates the mapping between the RISIS require-ment and existing shared vocabularies. Yet, reusing all the above vocabulariesdoes not entirely satisfy the RISIS’s need for describing a dataset. This forcedthe creation of new vocabularies such risis:usecase or risis:accessConditions (seeFigure 3) for concepts that are not covered by any of the selected vocabularies.
6 User-friendly authoring of metadata
As already mentioned in Section 2, the RISIS metadata about datasets is mod-eled in RDF. Resource Description Framework allows metadata to be shared and,facilitates integration in a structured and semantically machine interpretable wayacross di↵erent applications exploiting the metadata. The adoption of RDF as adata model for the RISIS project triggered the problem that, the auto-generatedmetadata stored need to be manipulated by data-owners who are not familiarwith the Semantic Web technologies and the ways to generate a standard andvalid RDF. In order to tackle this issue, we created a graphical user interface(UI) to enable RISIS non-expert data-owner users to generate and update theirdataset metadata .
To design the RISIS graphical UI for handling RDF metadata editor, wefollowed a user-centered approach where we first collected the UI requirementsby interviewing the potential end-users of the RISIS’s platform. We summarizehere the set of features which needed to be supported by the metadata editor:(1) render metadata properties in di↵erent categories (2) avoid presenting tothe user technical metadata properties (e.g. RDF dump, byte Size) (3) supportmetadata properties with hint to understand the meaning of the property (4)support the user with human readable information by avoiding displaying fullURI for example (5) facilitate inserting metadata values which follow a certainpattern (e.g. DataTime values, URLs, etc.)
Fig. 3. RISIS's Ontology. A view over mapped vocabularies reused.
respectively The Dublin Core metadata Element Set9 which is a ”vocabulary offifteen properties for use in resource description”, The PROV Ontology10 which isused to provide provenance description, The Vocabulary of Interlinked datasets(VoID)11 which is a data-model specific vocabulary for expressing metadataabout RDF datasets and, The Friend of a friend vocabulary12 for describingpersons. Although provenance is not shown in Figure 3, we discuss it here as ithas been extensively used behind the scene for describing data manipulations.
Other reused vocabularies that involved less coverage of the RISIS require-ments include DCAT which is primarily a ”vocabulary designed to facilitateinteroperability between data catalogs published on the Web”, DISCO13 whichis a vocabulary for documenting research and survey data, WAIVER14 which isa vocabulary for waivers of rights, The Provenance, Authoring and Versioning(PAV) [1] which is a ”lightweight ontology for capturing just enough descrip-tions essential for tracking the provenance, authoring and versioning of webresources”, The Simple Knowledge Organization System (SKOS)15 which is a ”acommon data model for sharing and linking knowledge organization systems viathe Semantic Web” and, RDF Schema16 which is a ”data-modeling vocabulary
9http://dublincore.org/documents/dces/
10https://www.w3.org/TR/2013/REC-prov-o-20130430/
11https://www.w3.org/TR/void/
12http://xmlns.com/foaf/spec/
13http://rdf-vocabulary.ddialliance.org/discovery.html
14http://vocab.org/waiver/terms
15https://www.w3.org/TR/swbp-skos-core-spec
16https://www.w3.org/TR/rdf-schema/
Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces
Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces
Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces
Discussion
• Science & innovation studies thrives on diverse and heterogeneous data
• Existing platforms do not take access restrictions into account, or
• … they do not provide sufficiently descriptive metadata to support research
• We performed a requirements analysis for minimal metadata needs
• Resulting in a vocabulary that integrates and connects existing standards, and
• … drives a Linked Data driven data search portal.