Upload
asist
View
805
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Micah Altman, Harvard; Policy-based Data Management The 2nd Research Data Access and Preservation (RDAP) Summit An ASIS&T Summit March 31-April 1, 2011 Denver, CO In cooperation with the Coalition for Networked Information http://asist.org/Conferences/RDAP11/index.html
Citation preview
Policy Based Digital Preservation:SafeArchive & The Dataverse Network
®Micah Altman, Institute for Quantitative Social Science, Harvard University
Prepared for the Research Data Access and Preservation SummitASIS&T
March 2011
Collaborators*
Policy Based Digital Preservation2
Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy
Research SupportThanks to the Library of Congress (PA#NDP03-1), the
National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.
* And co-conspirators
Related Work
Policy Based Digital Preservation3
Reprints available from: http://maltman.hmdc.harvard.edu
Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. (Forthcoming)
Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. "A Prototype Platform for Policy-Based Archival Replication." Against the Grain. 21(2): 44-47.
Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182
Crosas, M. 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2).
King, Gary (2007), " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research, Vol. 32, No. 2, pp. 173-199
Gutmann,M. Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. "From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data", Library Trends 57(3):315-33
SafeArchive: TRAC-Based Management of LOCKSS Facilitating collaborative replication and
preservation with technology… Collaborators declare explicit non-uniform resource
commitments Policy records commitments, storage network
properties Storage layer provides replication, integrity,
freshness, versioning SafeArchive software provides monitoring, auditing,
and provisioning Content is harvested through HTTP (LOCKSS) or OAI-
PMH Integration of LOCKSS, The Dataverse Network, TRAC
Policy Based Digital Preservation4
Adding Policy to LOCKSS
LOCKSS Lots of Copies Keep Stuff Safe Widely used in library community Self-contained OSS replication system, low maintenance,
inexpensive Harvests resources via web-crawling, OAI-PMH, database
queries,… Maintains copies through secure p2p protocol Zero trust & self repairing
What does SafeArchive Add Auditing – easily monitor number of copies of content in
network Provisioning – ensure sufficient copies and distribution Collaboration – coordinate across partners, monitor resource
commitments Provide restoration guarantees Integrate with Dataverse Network digital repository
Policy Based Digital Preservation5
Why this tool?
To facilitate institutions in making commitments aligned with their policies and incentives, and
Automatically execute and monitor those commitments and policies
(Self-interest… Support Data-PASS partnership agreements and transfer protocols)
This tool provides a targeted vertical slice of functionality through the policy stack…
Policy Based Digital Preservation6
Another Why…
Policy Based Digital Preservation7
R.I.P.
SafeArchive Components
Policy Based Digital Preservation8
Current
Planned
SafeArchive Auditing & Reports
Policy Based Digital Preservation9
Exam
ple
Fra
gm
en
ts
SafeArchive: TRAC Alignment
SafeArchive audits provide evidence for compliance with policies on: archival storage & preservation (B4) independent audit mechanisms (B2) appropriate system infrastructure
(C1) and disaster planning and recover
(C3) SafeArchive supports embedded
policy documentation: Organizational infrastructure (A1-4) Collection policies (B2.5,2.7,5.2) System configuration (C1.7-1.10)
Policy Based Digital Preservation10
SafeArchive: Schematizing Policy and Behavior
Policy Based Digital Preservation11
“The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.”
Policy
Schematization
Behavior(Operationalization)
The Dataverse Network ®
Policy Based Digital Preservation12
For Organizations For Scholars
•Brand it like your own website.•Upload any type of data.•Establish a persistent data citation•Facilitate data discovery•Provide live analysis •Receive permanent storage space
•Used by archives, libraries, journals, schools•Enable contributors to upload data•Organize studies by collections•Search across a universe of data•Control access and terms of use•Federate with catalogs and partners: OAI-PMH, LOCKSS, Z39.50, DDI
Dataverse Network – Designed for Research Data
Policy Based Digital Preservation13
Policy Support in the DataVerse Network
Access Control Roles: access, curation, administration Authenticate by: user, group, network, proxy
Workflow Policies Built-in Versioning and Deaccessioning Curatorial Review
Review of changes prior to release of new version Review of new virtual archives
Legal Policies Terms of use: accounts, uploads, downloads Hierarchical terms: network, archive, study Access request workflow
Policy Based Digital Preservation14
Archival Collaboration through shared infrastructure:Data-PASS
Data-PASS is a broad-based partnership of social science data archives.
Data-PASS partners collaborate to: identify and promote good archival
practices seek out at-risk research data mutually safeguard collections build preservation infrastructure
Data-PASS uses DataVerse: Creates federated catalog Manages content for some partners Provides simple way for
organizations to participate in partnership
Data-PASS uses SafeArchive: Collaboration through mutual
replication of partner content Supports legal transfer agreements
Policy Based Digital Preservation15
Where Do Policies Fit in Organizational Decisions?
Policy Based Digital Preservation16
NSDA
LOCKSS
META-ARCHIVE
DATA-PASS
SAFE
DVN
IRODS
Ideal integration of policy and technology?
Expressed in domain/business language Translated to a formal schematization Automatically measured by technology Directly controls procedures & actions to achieve compliance Verifiable translation from business domain policy
Where do we go from here Combine flexibility of IRODS and semantic level of TRAC Self-documenting infrastructure Formal verifiable translation of policy to schema, and schema to
action Make good policy easy to implement!
Policy Based Digital Preservation17
Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
Contact Us
Policy Based Digital Preservation18
Micah Altman
maltman.hmdc.harvard.edu
SafeArchive
safearchive.org
The Dataverse Network ™
thedata.org