29
#watitis2014 watitis.uwaterloo.ca @watitisconf ONTARIO LIBRARY RESEARCH CLOUD: BUILDING A PROVINCE-WIDE RESEARCH CLOUD FOR ONTARIO’S ACADEMIC LIBRARIES Pascal Calarco, University of Waterloo Library Andrew McAlorum, Information Systems & Technology

#watitis2014 watitis.uwaterloo.ca @watitisconf ONTARIO LIBRARY RESEARCH CLOUD: BUILDING A PROVINCE-WIDE RESEARCH CLOUD FOR ONTARIO’S ACADEMIC LIBRARIES

Embed Size (px)

Citation preview

#watitis2014

w a t i t i s . u w a t e r l o o . c a@ w a t i t i s c o n f

ONTARIO L IBRARY RESEARCH CLOUD: BUILDING A PROVINCE-WIDE

RESEARCH CLOUD FOR ONTARIO’S ACADEMIC L IBRARIES

P a s c a l C a l a r c o , U n i v e r s i t y o f Wa t e r l o o L i b r a r y

A n d r e w M c A l o r u m , I n f o r m a t i o n S y s t e m s & Te c h n o l o g y

#watitis2014

AGENDA

• Problem we’re trying to solve - Pascal• Funding and project plan - Pascal• Technology overview – Andrew• Some likely use cases – Andrew• Next steps – Pascal• Q&A

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

LIBRARIES’ GROWING STORAGE NEEDS

• Digitized physical materials: books, journals, film, audio

Reformatting to conserve original eg. Acidic paper such as newspapers

Reformatting to increase access eg. Rare materials

Format migration to preserve content eg. 16mm film

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

LIBRARIES GROWING STORAGE NEEDS

• Born digital scholarly content for long term stewardship:

E-Theses and supplemental material

Scholarship: Working papers, Pre-prints, Open Access

Research data: numeric, geospatial, image, audio

Websites and digital ephemera of academic interest

Donated electronic materials for Special Collections• John English’s hard drives of personal email correspondence,

drafts and other materials

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

OCUL STORAGE SURVEY (2013)

• 10 of 21 institutions responded; six >10k FTE, 4 smaller than 10k

• Preservation & Access Needs: 80%: digitized print content

80%: faculty publications

60%: donated digital content

50%: research data

50%: GIS data

40%: purchased digital resources

20%: corporate records

20%: E-Theses

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

OCUL SURVEY: STORAGE NEEDS

• Current storage requirements: 100GB-30TB; total of respondents: 58.5 TB

• Expected storage needs, next 2-3 years:20% 100TB+

40% 10TB-100TB

20% >10TB

250TB total for all 10 institutions

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

OCUL SURVEY: STORAGE PROVISIONING

• 80% partner with campus IT often/mostly• 60% provision in-house often/mostly• 40% provision with other partner libraries

often/mostly• 30% provision with commercial services

often/mostly

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

OCUL STORAGE SURVEY: TOP FEATURES (2013)

• Large storage on demand• Low cost• Canadian-based hosting• Transparent pricing• Archival quality storage

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

STORAGE ARCHITECTURES AND COST TIERS

#watitis2014

CLOUD OPTIONS

• Amazon S3/Glacier: $500k/year for current 250TB SP content

$2000/TB per year, recurring

• DuraCloud: Amazon reseller, adding preservation & mgmt. tools

$1000-$1500/TB per year, recurring

• Private Cloud: OpenStack $280-$350/TB per year, amortized over three years

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

MTCU PROPOSAL AND PIF FUNDING

• 2013/2014: OCUL was awarded $1.2 million Productivity and Innovation Fund (PIF) funding for OLRC startup

• 50TB per founding partner institution• Triplestore preservation: content copies at

three different co-located nodes for redundancy, error correction

• Text mining portal for stored ScholarsPortal content

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014#watitis2014

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

OPENSTACK • An open source cloud

computing platform, primarily deployed as an Infrastructure-as-a-Service (IaaS) platform

• Swift – OpenStack object store, store and retrieve data via API

• Integrate OpenStack/Swift to Digital Repository architectures

• Develop Dropbox-like cloud storage web interface

#watitis2014

USE CASES

• Audience: Librarians, Faculty• Digital Preservation• Institutional and Personal Storage• Repositories• Research Data Management• Text mining large volumes of digital textual

content for research purposes

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

DIGITAL CURATION

#watitis2014

FEDORA COMMONS

Open source digital object repository, that is the underlying architecture behind Islandora, Hydra, and other digital asset management systems.

#watitis2014

DSPACEAn open source turnkey institutional repository software for building open access repositories for scholarly and published digital content.

#watitis2014

ARCHIVEMATICA

An open source digital preservation system designed to maintain standards-based, long term access to collections of digital objects.

#watitis2014

DATAVERSE

• An open source web application for publishing, citing, analyzing and preserving research data.

• Research data management focus

#watitis2014

TEXT MININGPortential uses by researchers in Digital Humanities:• Entity recognition• Parts of speech

analysis• Topic modeling• Network analysis• Visualization

#watitis2014

CURRENT STATUS & MILESTONES

• October 2014: integration with Archivematica• December 2014: integration with DataVerse• Q1 2015: Storage Nodes finalized;

installation of Waterloo/Guelph/Laurier node• March 2015: integration with Fedora

Commons• May 2015: Third Hackfest, Text Mining Portal• June 2015: integration with DSpace

OLRC: Pascal Calarco & Andrew McAlorum

#watitis2014

THANKS! QUESTIONS?

• Pascal Calarco, uWaterloo [email protected] x38215

• Andrew McAlorum, [email protected] x31135

OLRC: Pascal Calarco & Andrew McAlorum