Academic Libraries and Big Data: Trends in Collection, Publication, Preservation, and Access

Preview:

Citation preview

#SIBF15 #SIBFALA15 @MCDONALD  @ALALIBRARY @SHJINTLBOOKFAIR 

TOPICS •  Big Data in Libraries •  Why Libraries? •  Libraries Supporting Data

•  Analysis •  Publication •  Worflow (Re-use)

cc:  Ray  Schamp  -­‐  h,p://www.flickr.com/photos/19009479@N00  

Big Data

cc:  Mark  McLaughlin  -­‐  h,ps://www.flickr.com/photos/51035737977@N01  

BIG DATA COMES FROM MANY SMALLER PACKAGES

cc:  FullPixel  Photography  -­‐  h,p://www.flickr.com/photos/98543207@N02  

LIBRARIES

PROVIDING OPEN DATA REPOSITORIES cc:  Paul  Stainthorp  -­‐  h,ps://www.flickr.com/photos/30409117@N07  

Research is now about workflow.

Worklow is often about

data. cc:  Vinovin  -­‐  h,ps://www.flickr.com/photos/10212590@N00  

http://innoscholcomm.silk.co/

KEY POINT 1 WHAT IS YOUR RESEARCHER WORKFLOW?

cc:  yaph  -­‐  h,ps://www.flickr.com/photos/8471827@N06  

Libraries

cc:  FullPixel  Photography  -­‐  h,p://www.flickr.com/photos/98543207@N02  

COLLABORATION SPACES LIBRARIES ARE:

cc:  TechSoup  for  Libraries  -­‐  h,ps://www.flickr.com/photos/9279573@N02  

PUBLISHERS LIBRARIES ARE:

cc:  Thomas  Hawk  -­‐  h,ps://www.flickr.com/photos/51035555243@N01  

The  Once  and  Future  Publishing  Library  –  Okerson/Holzman/CLIR  

The  Once  and  Future  Publishing  Library  –  Okerson/Holzman/CLIR  

REPOSITORIES LIBRARIES ARE:

cc:  Halans  -­‐  h,ps://www.flickr.com/photos/48889073931@N01  

"Nobel prizes have been given for inventing

instruments. I'm eagerly waiting for one for

inventing software."

Daniel S. Katz - U Chicago cc:  Yddlywinker  -­‐  h,p://www.flickr.com/photos/41687592@N00  

REPOSITORIES Institutional

Publish VMs (Virtual Machines) with DOIs (Document Object Identifiers)

A national science & engineering cloud http://jetstream-cloud.org/

Data  Publishing  

IU ScholarWorks http://scholarworks.iu.edu/

“Libraries serve the research and learning

needs of their universities.”

cc:  Andreas-­‐photography  -­‐  h,p://www.flickr.com/photos/19367634@N05  

From a talk by Lorcan Dempsey – The Library in the Life of a User http://www.slideshare.net/lisld/the-library-in-the-life-of-the-user

USE CASE HATHITRUST RESEARCH CENTER

Non-Consumptive Research Paradigm

•  No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection.

•  Definition disallows collusion between users, or

accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.

•  Repository – 13+ million volumes | 3+ billion pages – 50% of volumes are in English – Material from the 15th C. on | 20th C.

concentration – 70% in copyright or undetermined | 30% open

•  Interface – Search and read books in the public domain

About the HathiTrust Digital Library

HathiTrust Ecosystem

HathiTrust Research Center Ecosystem

1.  Secure Portal Access 2.  Data Capsule Access 3.  Feature Extraction Services

HTRC Approaches

HTRC Data Capsule Workflow

HTRC Data Capsule

Maintenance  Mode  Secure  Mode  

Running  other  workflow  

• The ability to slice through a massive corpus constructed from many different library collections, and out of that to construct the precise workset required for a particular scholarly investigation, is an example of the “game changing” potential of the HathiTrust...

Grand Motivation

Scope    

Basic Portal Workflow

DISTANT READING

MORETTI-STANFORD cc:  chrismar  -­‐  h,ps://www.flickr.com/photos/14334258@N00  

Understanding literature not by

studying particular texts,

but by

aggregating and analyzing massive

amounts of data.

KEY POINT 2 WHAT TYPES OF DATA INTERFACES?

cc:  Eric  Fischer  -­‐  h,ps://www.flickr.com/photos/24431382@N03  

NEW INTERFACES

Allen and Murdock – Indiana University cc:  caseorganic  -­‐  h,ps://www.flickr.com/photos/28980639@N02  

NEW INTERFACES

Chris Forster – Syracuse University cc:  caseorganic  -­‐  h,ps://www.flickr.com/photos/28980639@N02  

NEW INTERFACES

Jonathan Goodwin – Univ of Louisiana cc:  caseorganic  -­‐  h,ps://www.flickr.com/photos/28980639@N02  

KEY POINT 3 WHAT TYPES OF DATA ARE NEEDED?

cc:  Hans-­‐Werner  Guth  -­‐  h,p://www.flickr.com/photos/42448330@N00  

REPOSITORIES Software

REPOSITORIES Data

Libraries and the Researcher

•  What are the workflows needed by your researchers?

•  What are the interfaces that support those workflows?

•  What is the data that supports those workflows, interfaces, and researchers? •  Local •  Regional •  International

cc:  Marie  in  NC  -­‐  h,p://www.flickr.com/photos/24732687@N00  

Libraries are repositories of data for the creation of new

knowledge.

cc:  young_einstein  -­‐  h,p://www.flickr.com/photos/25047883@N00  

New Library Services provide

support for the

workflows of new

knowledge. cc:  tjmwatson  -­‐  h,ps://www.flickr.com/photos/63603238@N00  

Photo by Marcus Ramberg - Creative Commons Attribution-NonCommercial License https://www.flickr.com/photos/40021607@N00   Created with Haiku Deck  

THANKS

#SIBF15 | #SIBFALA15 cc:  nateOne  -­‐  h,ps://www.flickr.com/photos/49998984@N00  

Recommended