VIVO at the University of Idaho

VIVO at the

University of

IdahoSHINY HAPPY PEOPLE HOLDING

NODES: USING VIVO (A

SEMANTIC WEB APPLICATION)

TO REVEAL UNIVERSITY OF IDAHO

RESEARCH AND RESEARCHERS

What is VIVO?

An Open-Source …

Freely available with a community of librarians and web developers

Semantic Web application …

Data structured so that it can be shared and reused

using Linked Data practices and standards…

RDF (Resource Description Framework) Triples, which are controlled subject-predicate-object expressions that produce consistent relationships

and Data Harvesting procedures

Collecting, ingesting and publishing (public/private) data in batches

to create a searchable, browseable, and reusable network of information on research and researchers.

Early History of VIVO

1997-2005: VIVO Network idea developed at Cornell for life and social sciences.

Intended to provide a view of sciences and research “across disciplinary and administrative boundaries.”

2005: Released for Life Sciences

2007: Expanded to all of Cornell University (thru Library)

2009: $12.2 million NIH grant provided to develop a national version with several other partners

2010 – Present: More and more institutions adopting and developing VIVO instances

from “VIVO: Enabling National Networking of Scientists”

http://vivoweb.org/files/websci10_submission_82.pdf

VIVO at the University of Idaho

Spring 2012 – Fall 2012

Approached by Idaho INBRE (a Biomedical Researcher

network in Idaho) with question about possibly installing

VIVO instance

Installed VIVO, began setting up and learning the

system, while gathering feedback from INBRE and other

stakeholders

Garnered approval from INBRE faculty to publish their

information in the system

Harvested INBRE related information from public

resources: PubMed and NIH and NSF grants database


Spring 2013

Began to pursue expanded VIVO

Receive approval from institutional IT evaluation group

to go forward

Re-branded instance

Presented VIVO to library faculty and administration as

possible project going forward

Presented instance and proposal for new position to VP

of Research


Summer 2013

VP approved expanded use of VIVO for Research

Groups on campus and funding for position

Annie Gaines begins as Scholarly Communication

Librarian

Ingest, Ingest, Ingest,

Added three additional research groups, as well as the Law School, and associated faculty

Added thousands of grants, publications, and people into the system.


Fall 2013

Presented VIVO publicly on campus for first time

VIVO goes live (accessible from off campus)

Additional organizational descriptions added

(Department, College, Grant Strucutures, etc.)

Gained approval and access to use campus database

system, Banner


VIVO Today

Beginning to explore VIVO as front-end for historical documents

Adding all University Faculty

Creating applications and access points for data

Cleaning, always cleaning …

Using this presentation as a prompt for further development of application, as well as further defining:

the system’s presentation

our data’s preservation

and our mission and goals in using the system

Hosting

Provided by the Northwest Knowledge Network

www.northwestknowledge.net

NKN focuses on providing technical support to researchers

Division of UI’s Office of Research

Strong relationship with the UI Library (they are in the building)

Data is replicated to a data center at Idaho National Laboratory

Present future opportunities for integrating VIVO’s information with other research-related tools/systems

http://www.northwestknowledge.net/

Technical Specs

Our installation

Red Hat Linux

Apache Web Server

MySQL

Tomcat

Current Version of VIVO

1.5.2

Probably upgrade to 1.6 in March 2014

Building VIVO – Two Approaches

Approach #1 – the high-resource approach (ideal)

Requires

Discrete IT department

Available programmers and developers

Formal IT project management

Advantages

High-level of integration into existing systems/services

Advanced customization and configuration

Reasonably short time from inception to production

Disadvantages

Red-tape

Represents a large commitment by the unit

Building VIVO – Two Approaches

Approach #2 – the low-resource approach (practical)

Requires

Minimum recommended staff identified in the VIVO implementation guide

Experimental mindset

View VIVO as a series of small projects, rather than one large integration into university activities

Advantages

Simple

Manageable

Disadvantages

Time (takes much longer)

Integration with existing services

Creation of custom data ingest tools

Implementation Goals

Start with low-hanging fruit. It is easier to collect

When considering custom tools and processes, our priorities:

1 – re-use from community or locally

2 – buy if possible

3 – build as needed

Build institutional interest in the existing data before soliciting more resources to further our development

Investigate third-party solutions (Symplectic Elements) as alternatives to custom-building internal methods of collecting data

Data Ingestion - General

Typical workflow:

1. Receive data in source format

2. Convert to RDF (usually RDF/XML or Turtle)

3. Associate with VIVO ontology (as needed)

4. Reconcile against existing database

5. Load into the application

6. Re-index if needed

Data Ingestion - Sources

Public Sources

NSF, NIH, USDA Awards

Pubmed

Commercial Sources

Web of Science

Must remove “intellectual effort”

CVs, Publication Lists

Must have some means of soliciting them

Local Databases (central university, research groups)

Several institutional sources

Must work through the gatekeepers of each

Need data security review to ensure that institutional concerns are met before public exposure

Data Ingestion - Tools

VIVO Harvester

Extract, Transform, and Load (ETL) tool that takes data from a source and loads it into VIVO automatically

OpenRefine

Data cleaning tool

Very flexible for different datatypes

Extension enables export in RDF format

Reconciliation service allows us to match and de-duplicate entries before export

Custom Conversion Tools (in Python)

Used for CRIS reports output, as well as other consistent, but unusual formats

Ontology Extensions

Custom University of Idaho model prefixed with

“uidaho:”

Goals with our extensions

Re-use as much as possible

Establish the local need before creating

Always associate classes within the VIVO hierarchy so

that data is not fully reliant on uidaho for context

Examples

Members of Idaho EPSCoR, Idaho INBRE, REACCH-PNA

Non-UI/Courtesy Faculty

Data Re-use - Fuseki

Apache Jena - Fuseki project

jena.apache.org/documentation/serving_data/

Enables external access to VIVO data

Without Fuseki, data re-use is limited to those authenticated with the system

Created examples of data re-use to assist in marketing efforts

Goal: to establish value-addness of putting data in VIVO

Example: Labs who need to report the results of their research by creating publication lists, or displaying spatial, temporal, or conceptual aspects of UI research to stakeholders or students could use this feature


Example 1:

A very simple way to

look at awards data.

This presents the number

of awards by agency. It

is using a javascript

library called sgvizler to

turn JSON data from

Fuseki into a Google

Charts visualization.


Example 2:

An other simple view

using sg-vizler. This

shows a comparison of

two variables – awards

and publications – for

personnel in a specific

research group. It

would need work as a

formal graph, but it

points to the way that

the data can be re-

used.


Example 3:

An other simple example

of data re-use using a

javascript/ajax technique

to display a list of journal

titles and faculty within a

specific research group.

Links to the faculty

members’ VIVO profiles

are associated with their

names.

VIVO as

Institutional

Repository

Background

When Annie was brought on for Scholarly

Communications, one of her tasks was to develop

an IR for the UI.

Some potential platforms to use for UI IR:

CONTENTdm – too flat

Bepress – too expensive

VIVO?

‘Institutional repositories’

“A set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.”

Clifford Lynch, ARL Bimonthly Report 226, Feb. 2003.

“Digital collections that capture and preserve the intellectual output of university communities.”

Ryam Crowe, Case for Institutional Repositories, SPARC, 2002

‘Institutional repositories’

Are:

Institutionally defined and managed

Collection of scholarly work

Both cumulative and perpetual

Open

Provide:

Long term preservation

Wide dissemination

Showcase for scholars and the institution

Challenges

Copyright issues, varying access

Buy-in from faculty, voluntary submissions

Getting people to care

VIVO as IR?

Not your typical IR interface

Dynamic browsing and searching

Interconnectedness in a large network

Includes diverse materials, not just article pre-prints

Includes citations for all works, not just the ones hosted

in the IR

Linked data format allows for reuse of data for a variety

of purposes

The following page shows a theses document in

VIVO

Theory vs. Practice

Although VIVO can act as a front end, the

documents must be hosted elsewhere

We deposit our docs in CONTENTdm and link to the

PDF in VIVO

This makes things easier, but also more complicated

See example of the same theses document in

CONTENTdm on the next page

Theory vs. Practice

We wanted to close this presentation by asking

some questions to the group. If you have any

advice for us on this project we would love to hear

from you!

Are more access points better or more confusing?

Should we include historical documents in the VIVO IR?

Which page should be the main collection?

Should we provide links to all collections? Or link from

one into the other?

What are best practices with unusually constructed Irs?

Thank you!

Technology

VIVO at the University of Idaho