Upload
sage-publishing
View
310
Download
7
Embed Size (px)
Citation preview
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
From Big
Data to the
Big Picture
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Caroline Muglia, Head of Resource Sharing and Collection
Assessment Librarian at University of Southern California,
manages the Interlibrary Loan and Document Delivery
department and leads the collection assessment efforts for
the Library. Before this position, Caroline worked at the
Library of Congress and later as a Data Librarian for an
educational technology company.
Jill Parchuck has been the Associate University Librarian
for Science, Social Science and Medicine at Yale
University since 2014. Other positions Jill has held at Yale
include Director, Science and Social Science Libraries and
Co-Director of the Center for Science and Social Science
Information from 2010 to 2014 and Director of Social
Science Libraries and Information Services from 2007 to
2010.
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
While we do our best to answer as many questions as we can, time constraints may not allow us to
answer every question. Thank you for understanding.
Send us your questions!
Send in your questions
via the Question Box on your screen. →
Using Twitter? Use
the hashtag
#SAGETalks.
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Introduction
• Big data initiatives are plentiful!
• Libraries can play an important role• What steps can librarians take to contribute to big data
projects?
• How can libraries add value to big data projects?
• How can libraries determine the needs for data support?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Areas we will cover
I. Datasets (Homegrown and Purchased)
II. Licensing Data
III. Storage and Repositories
IV.Software and Tools
V. Looking Ahead
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
What is Big Data?
• Volume: Amount of data being created and ingested.
What qualifies as “big”?
• Variety: Number of types of data
• Velocity: Speed at which data is being created and
processed
• Value: How data is being analyzed and utilized
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Homegrown Datasets
Guiding question:
● Who is creating datasets and how?
● What are the uses of the datasets?
● Where are the datasets stored?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Created by researchers
• Spatial Sciences student
projects
• USC Neuroimaging and
Informatics
• Created by Partnerships
• Big Data for Discovery Science
• Libraries
• USC Shoah Digital Library (8-
petabytes)
• Created by researchers
• Institution for Social and
Policy Studies (ISPS)
• Yale Open Data Access
Project (YODA)
• Yale Proteomics Expression
Database (YPEDS)
• Produced by administrative
units of the institution
• Yale Sustainability
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Purchased DatasetsGuiding questions:
● In what format does the library receive the datasets?
● Where are the datasets stored?
● What kind of access do users have?
● How can users discover the datasets?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Subject specialist receives request from researchers and places
order
• Data librarian receives and manages data and places it on local
server
• Cataloger creates records for the online discovery system
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Licensing Data
Guiding questions:
● What are the terms of use?
● Access vs. ownership?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Review criteria of license
• Ensure the widest possible use of content
• Ensure that a viable platform is available to provide access
• Ensure that metadata can be provided
• Can we retain a backup copy?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Storage & RepositoriesGuiding questions:
● How much storage space is needed?
● What do we need the repository to do?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• USC Digital Repository
• ICPSR
• Departments/Schools
• Contract to repositories
• Purchase server space
from university
• Smaller data sets
• External hard drives
• Registry of Research Data
Repositories
• ICPSR - Inter-university
Consortium for Political and
Social Research
• Yale Social Science Data
Archive - all in local discovery
system
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Services for Analyzing Big
DataGuiding questions:
● Who owns the data? what rights management is needed?
● What do you need to do?
● Who will be using the data?
● What output options do you need to have?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Support For Using Data
• Organizing data
• Statistical analysis
• Cleaning data
• Manipulating data
• Managing data
• Data Visualization
• Retaining data
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Libraries
• Subject librarians
cultivate different skills
• Tableau license
• University-wide
• SC-CTSI (Clinical Data
Analysis)
• Center for High
Performance Computing
(HPC)
• Yale Center for Science and
Social Science Information
(CSSSI) - data services
• Yale StatLab Consultants -
statistical analysis
• CSSSI Research Data
Management - guide
• Yale Research Data
Consultation Group
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Looking Ahead
What can you do now?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Identify unique role that library can play
• Information management is a library service
• Data literacy or teaching with data; data education
• Expert trainers in Tableau, TDM tools
• Metadata expertise
• Store/make accessible other department’s raw data
• Can libraries provide analytical services?
• Learn needs of the institution
• Digital humanities projects can be a starting point
• Identify stakeholders
• Vendors and librarians can act as research partners
• Unique relationship that other departments may not have
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Future considerations
What should you be prepared to handle in the near
future?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Data management plans
• NSF data management requirements
• Open data
• Open Government
• Los Angeles Open City
• Open Science
• Data science
• More students trained in Data Science-increased knowledge on campus
• What is library’s role (Instruction, Collection Development) in meeting
these research needs, but also in capitalizing on them?
#SAGETalks
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Webinar recording, slides, and follow-up Q&A will be emailed to you and available on connection.sagepub.com.
Thank you!
Be sure to check our website for updates on our webinar series!
#SAGETalks