Upload
anniegaines
View
219
Download
0
Embed Size (px)
DESCRIPTION
In 2012, the University of Idaho Library began implementing VIVO, an open-source Semantic Web application, both as a discovery layer for its fledgling institutional repository and as a database to describe, visualize, and report university research activity. The presenters will detail some of the challenges they encountered developing this resource, while discussing the tools and techniques they used for obtaining, editing, and uploading institutional data into the RDF-based VIVO system.
Citation preview
VIVO at the
University of
IdahoSHINY HAPPY PEOPLE HOLDING
NODES: USING VIVO (A
SEMANTIC WEB APPLICATION)
TO REVEAL UNIVERSITY OF IDAHO
RESEARCH AND RESEARCHERS
What is VIVO?
An Open-Source …
Freely available with a community of librarians and web developers
Semantic Web application …
Data structured so that it can be shared and reused
using Linked Data practices and standards…
RDF (Resource Description Framework) Triples, which are controlled subject-predicate-object expressions that produce consistent relationships
and Data Harvesting procedures
Collecting, ingesting and publishing (public/private) data in batches
to create a searchable, browseable, and reusable network of information on research and researchers.
Early History of VIVO
1997-2005: VIVO Network idea developed at Cornell for life and social sciences.
Intended to provide a view of sciences and research “across disciplinary and administrative boundaries.”
2005: Released for Life Sciences
2007: Expanded to all of Cornell University (thru Library)
2009: $12.2 million NIH grant provided to develop a national version with several other partners
2010 – Present: More and more institutions adopting and developing VIVO instances
from “VIVO: Enabling National Networking of Scientists”
VIVO at the University of Idaho
Spring 2012 – Fall 2012
Approached by Idaho INBRE (a Biomedical Researcher
network in Idaho) with question about possibly installing
VIVO instance
Installed VIVO, began setting up and learning the
system, while gathering feedback from INBRE and other
stakeholders
Garnered approval from INBRE faculty to publish their
information in the system
Harvested INBRE related information from public
resources: PubMed and NIH and NSF grants database
VIVO at the University of Idaho
Spring 2013
Began to pursue expanded VIVO
Receive approval from institutional IT evaluation group
to go forward
Re-branded instance
Presented VIVO to library faculty and administration as
possible project going forward
Presented instance and proposal for new position to VP
of Research
VIVO at the University of Idaho
Summer 2013
VP approved expanded use of VIVO for Research
Groups on campus and funding for position
Annie Gaines begins as Scholarly Communication
Librarian
Ingest, Ingest, Ingest,
Added three additional research groups, as well as the Law School, and associated faculty
Added thousands of grants, publications, and people into the system.
VIVO at the University of Idaho
Fall 2013
Presented VIVO publicly on campus for first time
VIVO goes live (accessible from off campus)
Additional organizational descriptions added
(Department, College, Grant Strucutures, etc.)
Gained approval and access to use campus database
system, Banner
VIVO at the University of Idaho
VIVO Today
Beginning to explore VIVO as front-end for historical documents
Adding all University Faculty
Creating applications and access points for data
Cleaning, always cleaning …
Using this presentation as a prompt for further development of application, as well as further defining:
the system’s presentation
our data’s preservation
and our mission and goals in using the system
Hosting
Provided by the Northwest Knowledge Network
www.northwestknowledge.net
NKN focuses on providing technical support to researchers
Division of UI’s Office of Research
Strong relationship with the UI Library (they are in the building)
Data is replicated to a data center at Idaho National Laboratory
Present future opportunities for integrating VIVO’s information with other research-related tools/systems
Technical Specs
Our installation
Red Hat Linux
Apache Web Server
MySQL
Tomcat
Current Version of VIVO
1.5.2
Probably upgrade to 1.6 in March 2014
Building VIVO – Two Approaches
Approach #1 – the high-resource approach (ideal)
Requires
Discrete IT department
Available programmers and developers
Formal IT project management
Advantages
High-level of integration into existing systems/services
Advanced customization and configuration
Reasonably short time from inception to production
Disadvantages
Red-tape
Represents a large commitment by the unit
Building VIVO – Two Approaches
Approach #2 – the low-resource approach (practical)
Requires
Minimum recommended staff identified in the VIVO implementation guide
Experimental mindset
View VIVO as a series of small projects, rather than one large integration into university activities
Advantages
Simple
Manageable
Disadvantages
Time (takes much longer)
Integration with existing services
Creation of custom data ingest tools
Implementation Goals
Start with low-hanging fruit. It is easier to collect
When considering custom tools and processes, our priorities:
1 – re-use from community or locally
2 – buy if possible
3 – build as needed
Build institutional interest in the existing data before soliciting more resources to further our development
Investigate third-party solutions (Symplectic Elements) as alternatives to custom-building internal methods of collecting data
Data Ingestion - General
Typical workflow:
1. Receive data in source format
2. Convert to RDF (usually RDF/XML or Turtle)
3. Associate with VIVO ontology (as needed)
4. Reconcile against existing database
5. Load into the application
6. Re-index if needed
Data Ingestion - Sources
Public Sources
NSF, NIH, USDA Awards
Pubmed
Commercial Sources
Web of Science
Must remove “intellectual effort”
CVs, Publication Lists
Must have some means of soliciting them
Local Databases (central university, research groups)
Several institutional sources
Must work through the gatekeepers of each
Need data security review to ensure that institutional concerns are met before public exposure
Data Ingestion - Tools
VIVO Harvester
Extract, Transform, and Load (ETL) tool that takes data from a source and loads it into VIVO automatically
OpenRefine
Data cleaning tool
Very flexible for different datatypes
Extension enables export in RDF format
Reconciliation service allows us to match and de-duplicate entries before export
Custom Conversion Tools (in Python)
Used for CRIS reports output, as well as other consistent, but unusual formats
Ontology Extensions
Custom University of Idaho model prefixed with
“uidaho:”
Goals with our extensions
Re-use as much as possible
Establish the local need before creating
Always associate classes within the VIVO hierarchy so
that data is not fully reliant on uidaho for context
Examples
Members of Idaho EPSCoR, Idaho INBRE, REACCH-PNA
Non-UI/Courtesy Faculty
Data Re-use - Fuseki
Apache Jena - Fuseki project
jena.apache.org/documentation/serving_data/
Enables external access to VIVO data
Without Fuseki, data re-use is limited to those authenticated with the system
Created examples of data re-use to assist in marketing efforts
Goal: to establish value-addness of putting data in VIVO
Example: Labs who need to report the results of their research by creating publication lists, or displaying spatial, temporal, or conceptual aspects of UI research to stakeholders or students could use this feature
Data Re-use - Fuseki
Example 1:
A very simple way to
look at awards data.
This presents the number
of awards by agency. It
is using a javascript
library called sgvizler to
turn JSON data from
Fuseki into a Google
Charts visualization.
Data Re-use - Fuseki
Example 2:
An other simple view
using sg-vizler. This
shows a comparison of
two variables – awards
and publications – for
personnel in a specific
research group. It
would need work as a
formal graph, but it
points to the way that
the data can be re-
used.
Data Re-use - Fuseki
Example 3:
An other simple example
of data re-use using a
javascript/ajax technique
to display a list of journal
titles and faculty within a
specific research group.
Links to the faculty
members’ VIVO profiles
are associated with their
names.
VIVO as
Institutional
Repository
Background
When Annie was brought on for Scholarly
Communications, one of her tasks was to develop
an IR for the UI.
Some potential platforms to use for UI IR:
CONTENTdm – too flat
Bepress – too expensive
VIVO?
‘Institutional repositories’
“A set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.”
Clifford Lynch, ARL Bimonthly Report 226, Feb. 2003.
“Digital collections that capture and preserve the intellectual output of university communities.”
Ryam Crowe, Case for Institutional Repositories, SPARC, 2002
‘Institutional repositories’
Are:
Institutionally defined and managed
Collection of scholarly work
Both cumulative and perpetual
Open
Provide:
Long term preservation
Wide dissemination
Showcase for scholars and the institution
Challenges
Copyright issues, varying access
Buy-in from faculty, voluntary submissions
Getting people to care
VIVO as IR?
Not your typical IR interface
Dynamic browsing and searching
Interconnectedness in a large network
Includes diverse materials, not just article pre-prints
Includes citations for all works, not just the ones hosted
in the IR
Linked data format allows for reuse of data for a variety
of purposes
The following page shows a theses document in
VIVO
Theory vs. Practice
Although VIVO can act as a front end, the
documents must be hosted elsewhere
We deposit our docs in CONTENTdm and link to the
PDF in VIVO
This makes things easier, but also more complicated
See example of the same theses document in
CONTENTdm on the next page
Theory vs. Practice
We wanted to close this presentation by asking
some questions to the group. If you have any
advice for us on this project we would love to hear
from you!
Are more access points better or more confusing?
Should we include historical documents in the VIVO IR?
Which page should be the main collection?
Should we provide links to all collections? Or link from
one into the other?
What are best practices with unusually constructed Irs?
Thank you!